The Year 2038 Problem

by Roger M. Wilcox
last updated 23-October-2003

What's so special about 2038?

Early UNIX programmers had quite a sense of humor.  In their documentation for the tunefs utility, a command-line program that fine-tuned the file system on the machine's hard disk, a note at the end reads "You can tune a file system, but you can't tune a fish."  A later generation of UNIX authors, fearful that stuffy, humorless corporate drones would remove this cherished pun, added a programmer's comment inside the documentation's source code that read, "Take this out and a UNIX demon will dog your steps until the time_t's wrap around!"

On January 19, 2038, that is precisely what's going to happen.

For the uninitiated, time_t is a data type used by C and C++ programs to represent dates and times internally.  (You Windows programmers out there might also recognize it as the basis for the CTime and CTimeSpan classes in MFC.)  time_t is actually just an integer, a whole number, that counts the number of seconds since January 1, 1970 at 12:00 AM Greenwich Mean Time.  A time_t value of 0 would be 12:00:00 AM (exactly midnight) 1-Jan-1970, a time_t value of 1 would be 12:00:01 AM (one second after midnight) 1-Jan-1970, etc..  Since one year lasts for a little over 31 000 000 seconds, the time_t representation of January 1, 1971 is about 31 000 000, the time_t representation for January 1, 1972 is about 62 000 000, et cetera.

If you're confused, here are some example times and their exact time_t representations:

Date & timetime_t representation
1-Jan-1970, 12:00:00 AM GMT 0
1-Jan-1970, 12:00:01 AM GMT 1
1-Jan-1970, 12:01:00 AM GMT 60
1-Jan-1970, 01:00:00 AM GMT 3 600
2-Jan-1970, 12:00:00 AM GMT 86 400
3-Jan-1970, 12:00:00 AM GMT 172 800
1-Feb-1970, 12:00:00 AM GMT 2 678 400
1-Mar-1970, 12:00:00 AM GMT 5 097 600
1-Jan-1971, 12:00:00 AM GMT 31 536 000
1-Jan-1972, 12:00:00 AM GMT 63 072 000
1-Jan-2003, 12:00:00 AM GMT 1 041 379 200
1-Jan-2038, 12:00:00 AM GMT 2 145 916 800
19-Jan-2038, 03:14:07 AM GMT 2 147 483 647

By the year 2038, the time_t representation for the current time will be over 2 140 000 000.  And that's the problem.  A modern 32-bit computer stores a "signed integer" data type, such as time_t, in 32 bits.  The first of these bits is used for the positive/negative sign of the integer, while the remaining 31 bits are used to store the number itself.  The highest number these 31 data bits can store works out to exactly 2 147 483 647.  A time_t value of  this exact number, 2 147 483 647, represents January 19, 2038, at 7 seconds past 3:14 AM Greenwich Mean Time.  So, at 3:14:07 AM GMT on that fateful day, every time_t used in a 32-bit C or C++ program will reach its upper limit.

One second later, on 19-January-2038 at 3:14:08 AM GMT, disaster strikes.


What will the time_t's do when this happens?

Signed integers stored in a computer don't behave exactly like an automobile's odometer.  When a 5-digit odometer reaches 99 999 miles, and then the driver goes one extra mile, the digits all "turn over" to 00000.  But when a signed integer reaches its maximum value and then gets incremented, it wraps around to its lowest possible negative value.  (The reasons for this have to do with a binary notation called "two's complement"; I won't bore you with the details here.)  This means a 32-bit signed integer, such as a time_t, set to its maximum value of 2 147 483 647 and then incremented by 1, will become -2 147 483 648.  Note that "-" sign at the beginning of this large number.  A time_t value of -2 147 483 648 would represent December 13, 1901 at 8:45:52 PM GMT.

So, if all goes normally, 19-January-2038 will suddenly become 13-December-1901 in every time_t across the globe, and every date calculation based on this figure will go haywire.  And it gets worse.  Most of the support functions that use the time_t data type cannot handle negative time_t values at all.  They simply fail and return an error code.  Now, most "good" C and C++ programmers know that they are supposed to write their programs in such a way that each function call is checked for an error return, so that the program will still behave nicely even when things don't go as planned.  But all too often, the simple, basic, everyday functions they call will "almost never" return an error code, so an error condition simply isn't checked for.  It would be too tedious to check everywhere; and besides, the extremely rare conditions that result in the function's failure would "hardly ever" happen in the real world.  (Programmers: when was the last time you checked the return value from printf() or malloc()?)  When one of the time_t support functions fails, the failure might not even be detected by the program calling it, and more often than not this means the calling program will crash.  Spectacularly.


Will fixing Year 2000 bugs help fix Year 2038 bugs?

No.

time_t is never, ever at fault in any Year 2000 bug.  Year 2000 bugs usually involve one of three things: The user interface, i.e., what year do you assume if the user types in "00"; a database where only the last two digits are stored, i.e., what year do you assume if the database entry contains a 00 for its year; and, in rare instances, the use of data items (such as the struct tm data structure's tm_year member in a C or C++ program) which store the number of years since 1900 and can result in displays like "19100" for the year 2000.

Year 2038 bugs, on the other hand, occur when a program reads in a date and carries it around from one part of itself to another.

You see, time_t is a convenient way to handle dates and times inside a C or C++ program.  For example, suppose a program reads in two dates, date A and date B, and wants to know which date comes later.  A program storing these dates as days, months, and years would first have to compare the years, then compare the months if the years were the same, then compare the days if the months were the same, for a total of 3 comparison operations.  A program using time_t's would only have to compare the two time_t values against each other, for a total of 1 comparison operation.  Additionally, adding one day to a date is much easier with a time_t than having to add 1 to the day, then see if that puts you past the end of the month, then increase the month and set the day back to 01 if so, then see if that puts you past the end of the year, et cetera.  If dates are manipulated often, the advantage of using time_t's quickly becomes obvious.  Only after the program is done manipulating its time_t dates, and wants to display them to the user or store them in a database, will they have to be converted back into days, months, and years.

So, even if you were to fix every Year 2000 Bug in a program in such a way that users and databases could use years as large as 9999, it wouldn't even brush on any of the Year 2038 Bugs lurking within the same program.


The Problem with Pooh-Poohing

Admittedly, some of my colleagues don't feel that this impending disaster will strike too many people.  They reason that, by the time 2038 rolls around, most programs will be running on 64-bit or even 128-bit computers.  In a 64-bit program, a time_t could represent any date and time in the future out to 292 000 000 000 A.D., which is about 20 times the currently estimated age of the universe.

The problem with this kind of optimism is the same root problem behind most of the Year 2000 concerns that plagued the software industry in previous years: Legacy Code.  Developing a new piece of software is an expensive and time-consuming process.  It's much easier to take an existing program that we know works, and code one or two new features into it, than it is to throw the earlier program out and write a new one from scratch.  This process of enhancing and maintaining "legacy" source code can go on for years, or even decades.  The MS-DOS layer still at the heart of Microsoft's Windows 98 and Windows ME was first written in 1981, and even it was a quick "port" (without many changes) of an earlier operating system called CP/M, which was written in the 1970s.  Much of the financial software hit by the various Year 2000 bugs had also been used and maintained since the 1970s, when the year 2000 was still thought of as more of a science fiction movie title than an actual impending future.  Surely, if this software had been written in the 1990s its Year 2000 Compliance would have been crucial to its authors, and it would have been designed with the year 2000 in mind.  But it wasn't.

I should also mention that computer designers can no longer afford to make a "clean break" with the computer architectures of the past.  No one wants to buy a new kind of PC if it doesn't run all their old PC's programs.  So, just as the new generation of Microsoft Windows operating systems has to be able to run the old 16-bit programs written for Windows 3 or MS-DOS, so any new PC architecture will have to be able to run existing 32-bit programs in some kind of "backward compatibility" mode.

Even if every PC in the year 2038 has a 64-bit CPU, there will be a lot of older 32-bit programs running on them.  And the larger, more complex, and more important any program is, the better are its chances that that it'll be one of these old 32-bit programs.


What about making time_t unsigned in 32-bit software?

One of the quick-fixes that has been suggested for existing 32-bit software is to re-define time_t as an unsigned integer instead of a signed integer.  An unsigned integer doesn't have to waste one of its bits to store the plus/minus sign for the number it represents.  This doubles the range of numbers it can store.  Whereas a signed 32-bit integer can only go up to 2 147 483 647, an unsigned 32-bit integer can go all the way up to 4 294 967 295.  A time_t of this magnitude could represent any date and time from 12:00:00 AM 1-Jan-1970 all the way out to 6:28:15 AM 7-Feb-2106, surely giving us more than enough years for 64-bit software to dominate the planet.

It sounds like a good idea at first.  We already know that most of the standard time_t handling functions don't accept negative time_t values anyway, so why not just make time_t into a data type that only represents positive numbers?

Well, there's a problem.  time_t isn't just used to store absolute dates and times.  It's also used, in many applications, to store differences between two date/time values, i.e. to answer the question of "how much time is there between date A and date B?".  (MFC's CTimeSpan class is one notorious example.)  In these cases, we do need time_t to allow negative values.  It is entirely possible that date B comes before date A.  Blindly changing time_t to an unsigned integer will, in these parts of a program, make the code unusable.

Changing time_t to an unsigned integer would, in most programs, be robbing Peter to pay Paul.  You'd fix one set of bugs (the Year 2038 Problem) only to introduce a whole new set (time differences not being computed properly).


Not very obvious, is it?

The greatest danger with the Year 2038 Problem is its invisibility.  The more-famous Year 2000 is a big, round number; it only takes a few seconds of thought, even for a computer-illiterate person, to imagine what might happen when 1999 turns into 2000.  But January 19, 2038 is not nearly as obvious.  Software companies will probably not think of trying out a Year 2038 scenario before doomsday strikes.  Of course, there will be some warning ahead of time.  Scheduling software, billing programs, personal reminder calendars, and other such pieces of code that set dates in the near future will fail as soon as one of their target dates exceeds 19-Jan-2038, assuming a time_t is used to store them.

But the healthy paranoia that surrounded the search for Year 2000 bugs will be absent.  Most software development departments are managed by people with little or no programming experience.  (Dilbert's boss is an extreme case of this, but computer-illiterate software managers are more common than you might think.)  It's the managers and their V.P.s that have to think up long-term plans and worst-case scenarios, and insist that their products be tested for them.  Testing for dates beyond January 19, 2038 simply might not occur to them.  And, perhaps worse, the parts of their software they had to fix for Year 2000 Compliance will be completely different from the parts of their programs that will fail on 19-Jan-2038, so fixing one problem will not fix the other.


back to main page
email the author at rogermw@ix.netcom.com