Site Server Move & Errors

We were down for a day! As many of you might have noticed, the website was down and not working properly for most of the day yesterday. Unfortunately, this was due to a server move that was required to ensure that we were ready for some upcoming security updates that would be needed. While we did everything we needed to ensure that the site move would go smoothly, thing’s didn’t go well.

There are still some problems with the site. Certain images are missing for our products are missing. Certain functions on the backend are not working properly any longer. In addition, some customers who placed an order between Tuesday and Wednesday might find their orders missing in the backend and account as these orders were not imported properly. However, all products can be purchased and the site is working fine right now.

For those on our newsletter, we recommend checking it as we’ll be sending out a quick flash sale for those inconvenienced by the site going down.

A gorilla, a dog and a horse walk into a bar…

Tell me if you’ve heard this one before.

So, a gorilla, a dog and a horse walk into a bar.  What do you think they see? For each, their perspectives would be so different – from height differences to colour ranges to their ability to manipulate objects in the bar.  How would you construct a bar for each of these 3 animals (if they could use it).   You’d have to have a place for the horse to stand, the gorilla to sit, the dog to lie… you’d have to have tables with 3 different heights.  What if they all wanted to sit together?

Could you imagine building a retail store that could / would do that? It sounds like a science fiction only bar…

Yet, if you own an online store; you commit to doing that very fact every day. 

There are currently 4 major browsers in use – Safari, Firefox, Chrome and Internet Explorer.  Each of these browsers has a minimum of 3 majorly used versions, some more.  Each version will ‘view’ your website slightly differently and each browser will process code vastly differently at times.

On top of that, people view your website from multiple devices – mobile phones, tablets, mini-tablets, desktop computers, etc.  Each of these will have different screen resolutions and screen sizes – you’ll need to code for that too – individually quite often for each of the above browser versions.

Then add plugins or add-on’s that are available for each browser.  Have an Norton Antivirus system scanning pages? How about Adblock? Each of those could affect how the browser handles the webpage again.

Replication and Fixes

That’s why if you hit a bug on a site and you complain about it, it might never get fixed – or it could take a while to get fixed.  The simple truth is that as a small business, we have to triage.

  • What’s the problem?
  • Is this replicable?
  • If not, is it in a critical area of the site?
  • How many people would / could this affect?
  • What browser / operating system / device are they using?
  • Has this been reported before?
  • Is this something we can fix?

There are times when a problem comes in and all we do is sit on it.  It happened to you – okay.  Maybe it’s just you – let’s see if we can replicate.  If we can’t, maybe it’s just that user.  Sometimes, the user’s not a gorilla, a dog or a horse.  Sometimes, the user might be a porpoise.

Bugs, Bugs, Bugs

I thought I’d quickly write a little explanation of what happens when a customer (or we) find a bug in the website.   Most of the time, the process goes through the following stages:

  1. Awareness
  2. Logging
  3. Replication
  4. Triage
  5. Coding Fixes
  6. Testing
  7. Deployment

Let’s discuss them in more detail:

Awareness

Pretty self-explanatory, a customer or one of us comes across a bug.  We get told about it.

Logging

Once a bug comes to our attention, we log the bug information (or what is mentioned to us as a bug).  In some cases, the ‘bug’ is not a bug but a feature or a core funcitonality – e.g. we don’t store credit cards or allow customers to edit orders themselves once an order is placed.

Replication

Time to see if we can replicate the bug.  A good 20 – 30% of all bugs reported to us are not replicable.  Whether it’s due to different browsers or operating systems, specific extensions on browsers creating conflicts or even the server / network the customer is on; we are not able to replicate the exact environment that caused the bug.

If we are not able to replicate a bug, we can’t fix it.   Thus the log – we keep a log on this issue, see if it (or something close to it) happens again.  If enough people manage to replicate this, quite often we are able to acquire sufficient information from the various individuals to finally replicate the bug. Then it’s on to the next step…

Triage

How big an issue is this bug? Bugs are assessed on a variety of factors:

  • number of individuals affected
  • where in the checkout process this is happening (a checkout bug is much more important than one in the article pages)
  • number of functions it affects
  • other bugs that have not been fixed
  • complexity of problem (if it’s something I could fix compared to a professional developer)

Once the assessment is done, we slot it into our ticketing system with our developers

Coding Fixes

Next up is the fixes and coding.  Dependingo n the complexity of the problem, either I or our developers will work on the problem.  If it’s an issue which our developers are able to solve, we normally have to wait due to their workload.  This can often cause long delays.  Unfortunately, finding competent developers whom we can trust is difficult.

Testing

Once a fix has been made, we have to test it.  Obviously the developers have tested it, but to ensure the site does not break we generally do testing ourselves as well.  This often can bring up new problems, so off we go back to coding fixes till the fix passes.

Deployment

Finally, we deploy the fixes.  At this point, we do one last test to make sure the bug is fixed and nothing else breaks.  Once that happens, we are good to go.  We keep an eye on the problem, just in case it crops up again, but generally it should be fixed and we’re good to go fix another bug.

Server Down! Server Down!

As many of you know, the site went down on Monday when the sale went live.  The load on the site tripled and a number of poorly coded portions started acting up.  For that, we do apologise to everyone, especially those who missed out on  games they wished to purchase because of the site going down.

We’ve now fixed numerous issues and have highlighted some of the other problems for fixing as well as introducing some new features to distribute the load.  It’s actually improved the overall site performance, even with the increased load we’re still seeing.

With that said, I’m going to get (mildly) technical about what happened.  If you are not interested in things like that, you should stop reading now.

Continue reading “Server Down! Server Down!”

Interac Online – New Payment Method

As you probably have noticed, we now are able to take Interac Online on the site as another payment method.  It’s something that we have actually been considering for a while; but it’s taken a bit to get the system running.  It’s taken much longer than I would have liked (as in, missing the Christmas rush) to get it up; but there are reasons for that.

Interac Online – Only Valid for 4 Banks

Before I get into that, I just wanted to clarify that Interac Online is only valid for the following banks:

  • BMO (Bank of Montreal)
  • Scotiabank
  • RBC
  • TD Canada Trust

All other banks and financial institutions do not support the system as yet.  It’s one of the reasons we hesistated on adding it for so long.  Hopefully, the other banks will integrate themselves soon.

The Process

Getting Interac set-up takes quite a bit of work when it’s an online system.  Here’s an abbreviated list of steps:

  • Apply for and be approved for an Interac account
  • Find and purchase an Interac Online Module
  • Install module on staging site
  • Test module for errors / compatibility issues and fix
  • Request Interac approval of IP address and Interac URLs on staging site
  • Check and fix errors
  • Send required documentation to Interac Online for approval
  • Fix error / requirements
  • Receive approval for staging
  • Request Interac approval for new Interac URLs on production
  • Go live, check for problems
  • Fix new problems on production site that wasn’t viable to test on staging
  • Finally go live (again) with finished product

Yup, that’s a lot of points and work.  And at least 90% of that requires development time.  Truthfully, if I had known how much time this would have taken (and cost!); I’d never have bothered to go ahead with this project.  The sheer volume of work (and most of it make work too) and the complex procedures has wiped out any potential cost savings of going live with this project (and then some!)

It’s no real wonder that most sites haven’t bothered to integrate Interac.  Between the large amount of work involved integrating an approved module into your site and the limited use, I doubt Interac Online will become a common payment method.  At least not till it gets fixed.

Ah well, it’s live now and the cost is sunk.

 

 

Running the Sites Backend – Process Progression

Over the years, the how and why of managing the websites’ backend – the files and databases has seen a gradual progression to more complex methods.  I think in many ways it showcases the common route smaller e-commerce / online businesses progress in their processes so I figured I’d write it up.

Single Site – Everything Live

In the beginning, we had single site with everything live on the site.  So any changes we made was automatically on the website as we had to test changes ‘live’.  Bugs, code fixes, new content – it all went up live.  This meant that we had to be careful when we started deploying code and keep all the back-up files on our computer in case something went wrong and we couldn’t figure out the code fix quickly.  On the other hand, it also meant that there was only one site to ever worry about and everyone who worked on it had access to the same files (mostly – see below for potential problems).

This works fine if you don’t mind deploying code and fixes late at night or when you know there are few customers around.  It’s fine if you don’t have a lot of customers or don’t have a lot of big changes to do; but can be a mess if you have a ton of customers at any time of the day or worst; are trying to install a large upgrade / expansion / module.

Oh, the other major issue – all your developers (if you work with more than one) have access to all your ‘real’ databases including customer information. A potential privacy problem.

Staging and Production Sites

A staging site is a ‘fake’ website that (in theory) is exactly the same as your production site.  The staging site can be populated with ‘fake’ database information; reducing privacy problems while allowing you to continue to test code changes.  In addition; because the site is not live your devleopers can put up a partial fix, test it out and then come back to it at a later date (or leave you to do a test).   Timing becomes less of an issue because the staging site can be broken without affecting front-end sales.

Once a change is considered production ready, you can then download the changed files and send the files over to the live site.  This is generally a manual process and one that you have to do yourself.  Part of the reason you implemented this entire process is for privacy reasons.  It makes no sense to ask your developer to do the fix.

Of course, as any IT person will tell you – just because it worked on the staging site doesn’t mean it will work on production.  Sometimes that means a bit of scrambling; but it’s a lot less likely to be a problem.  We’ve been doing this process for the last 3 years or so; bumbling our way through multiple sites, trying to remember to take backups as necessary and keeping multiple versions of the files on our home computer.

Developer, Staging, Production & Version Control

We’ve recently grown-up and moved to a more complex system with 3 sites and version control.

The Developer sites reside on the developer’s server where they test code.  It’s the working version of the site where all the changes are tested in multiple versions till a ‘good’ fix is ready.

Then the ‘good’ fix is sent to Staging, where it’s deployed.  Here, I do the test to ensure there’s no bugs that the developer has missed.  If there are, I send the bug report to the developer who goes back to working on the Developer site before uploading the fix.  If there isn’t, we deploy to Production.

It’s very similar to the above method; except for the addition of Version Control.  We use Springloops for our version control system and can’t be happier.

Version control systems do a few important things for us:

a) it automatically keeps a repository of all files – old and new.  It keeps dates and keeps information about changes so we can ‘roll back’ to an older version with just a click of a button.  No more hunting for files and hoping we had backed it up properly; its all done.

b) deployment of code can be set up to be automatic to Staging servers and Manual with Production servers, while keeping deployment simple.  Quite literally; a click of a button again – so no more worrying if we had missed a file.  The exact same set of files get sent to both; so it removes ‘human error’ from the equation.

c) it allows multiple developers to work on the site at the same time.  Even if a pair of developers download the same file and make different changes, the software will show and indicate any conflicts.  This way, no one developer’s work is ‘over-written’ by accident.

d) it restricts access even further.  With Springloops, we provide access to the repository but not to the actual FTP site. It also lets us invite multiple developers and kick them out easily while keeping track of all the files they’ve touched.

Truthfully, I cannot be happier that we found this solution.  It allows us to get more changes done with new modules and to roll out changes easier.  It’s something I’d recommend to anyone with a website that they have numerous changes on.

Security – Trials & Tribulations

As a business, one of the greatest fears is a security breach that exposes customer financial information.  It’s a nightmare; since being hit by something like this could potentially cripple a business.  We recently had a bit of a scare when 2 customers commented that fraudulent activity had occurred on their credit cards soon after placing an order with us.  Not surprisingly, we decided to conduct a full audit of the site and in the interest of transparency felt we should also write about it here on the blog.

To skip to the end first – there is no security breach on the site.

Background

To understand the story, it’s worth discussing the security procedures that are in-place to keep a customer’s financial information safe.

We do not store credit card information

Those of you who have ever had to edit your order will notice that they generally end up saying ‘Check / Money Order’ on the edited order.  The only time an edited order would say something else would be if the customer had called in to provide us the credit card information again.    This is because we do not store or have access to a credit card once the order is placed.

When an order is placed on the site, the credit card information is sent in an encrypted format to the site and from there, to the credit card gateway who authorises the charge on the card.  We are then provided a token indicating the authorisation for our records.  This allows us to charge a card for the authorised order amount only.  The only credit card information that we store is the card type, the last 4 digits and the expiry date.  None of that is sufficient to run a new charge on the card.

With PayPal of course, all we get is the e-mail address that the payment came from.

Everything is encrypted

The Checkout Page is completely encrypted in a SSL 128-bit encryption (the same method that the big retailers like Amazon use which is basically an industry standard) and anytime we access our backend, all the data passed back and forth is encrypted as well.  So the card is completely secure during transit and on the site.

Regular Scans

Lastly, both our server host as well as our developer regularly run scans to ensure that aren’t any viruses / malware / etc sitting among our files.

The Incidents

Once in a while, a customer contacts us that they have had to change their credit card information due to fraud.  We generally take note of it and run a quick security assessment  but due to the above on-going security procedures it’s generally not likely to have originated from our site.

This time a pair of customers contacted us separately in a very short period, both with very similar stories – initial orders placed very close together, fraudulent activity on the same day, both having orders placed on our site.   That seriously concerned us, enough that we decided to shift gears and focus on a security audit.

The Audit

Since both customer placed the orders remotely, we knew it couldn’t be an HR issue (remember, there’s literally no way for us to get a credit card number unless a customer calls us to place the order over the phone). As such, we knew to focus on our attention on the site and the site code.

We took the audit on in 3 parts.

1) External Audit

We ran the site through a number of external company verifications (e.g. McAffee, Google’s Webmaster, etc) initially to see if the problem was picked up by them. This ensured that no external scripts was being loaded from the site which could have caused problems.

2) Automated File Review

We then began an audit on the files in the site and database. This was an automated process that basically reviewed every file on the site to ensure that it was meant to be there; as well as looking for specific known malicious code.

3) Eyes on Code

Lastly, we put eyes on the code.  Every single file and script that was involved in the process of providing the checkout page on Starlit Citadel was reviewed. Since this is the only location where the credit card information is input, this was the most important ‘fail point’ and thus the extra scrutiny.

In all three tests, we could not locate any potential security problems.While there is never any guarantee, it’s extremely unlikely that we had  a breach in security.  It still is something that had to be done; and I’m open to any other suggestions for things we can do as well to improve security if you have any.  Overall though, it made for a couple of extremely stressful and expensive days.