Cloud

Toyota car plant outage shows database capacity planning is vital

Door

20 september 2023

316

Insufficient disk space on database servers was the cause of Toyota car production grinding to a halt for 36 hours at 14 plants in Japan late in August.

But just how does such a situation arise? What’s pretty likely here is a failure in database capacity planning and to provide for sufficient storage.

That’s an activity that might often go under the radar of the C-suite, but if things go wrong, the result will be keenly felt. Any estimates of what lost production at 14 Toyota plants for a day and half is worth must be pretty hefty.

So, what happened? Toyota put out a press release that stated a cyber attack was not to blame. Instead, it said: “The system malfunction was caused by the unavailability of some multiple servers that process parts orders,” which is a critical part of the lean/just-in-time production system for which Toyota is famous. “[R]egular maintenance work was performed on August 27, the day before the malfunction occurred. During the maintenance procedure, data that had accumulated in the database was deleted and organised, and an error occurred due to insufficient disk space, causing the system to stop.”

The press release went on to say: “Since these servers were running on the same system, a similar failure occurred in the backup function.”

Key here is what looks like a failure in database capacity planning. That would be the likely issue if data in a database was “deleted and organised” (presumably meaning “reorganised”) because deletion of data does not result in a smaller database file and may even see it grow.

In addition, the basic risk mitigation measure of separating production and backup storage also seems to have been overlooked, but we’ll leave that aside for the time being.

Database sizing – prediction of database storage needs – and capacity planning, which is the practice of ongoing provision of the required storage capacity, are core skills for database administrators.

Capacity planning is more complex than the layperson might imagine because the size of database files does not just grow and shrink according to need without further impacts.

To start with, a (relational) database is not just a database. “A database” does consist of a database file in which allocated rows and columns and multiple and connected tables are defined. But, there are also temporary files, such as SQL Server’s TempDB files. There are also indexes to allow for rapid access to particular frequently used rows, log files that record all database activity, and backups created by the database and associated applications.

Also, different suppliers’ database systems have different implementations of these basic elements and can occupy different volumes of storage capacity for the same nominal database size.

A 20,000 node (host, server, etc.), 20GB database file size in SQL Server, for example, requires 46GB when log files, TempDB files and backups are taken into account. The equivalent database file size in Oracle and PostgreSQL would take up 90GB and 26GB respectively.

The issue with deletes in a database – which Toyota identified as a cause in its problems – is that a database delete does not reduce file size. If a row is removed, its allocated space is not removed. It’s just marked as unused. Database size is set when it is originally configured and periodically reset through its lifetime, but the key thing is that space is allocated and it stays that way, primarily to make sure storage needs are not suddenly overwhelmed by huge bouts of shrinkage and growth. It’s pretty much an in-stone commandment of the DBA world that databases never shrink.

Toyota car plant outage shows database capacity planning is vital

Recente posts

GA: AKS image cleaner

CIQ, Oracle, and SUSE Form Open Enterprise Linux Association

IR35 private sector reforms: HMRC reminds firms to use ‘reasonable care’ when applying new...

HMCTS discloses £12.5m HMRC tax bill over IR35 status contractor assessment errors

Nieuws – SIDN en VvR geven hostingbedrijven inzicht in de veiligheid van hun platform

Meest bekeken posts

Podcast: HDDs performance metrics and the workloads they excel at

AI in the data center: Transforming operations and careers

Accounting watchdog ‘disclaims’ College of Policing financial accounts after serious IT failures

UK government signs deal with Google Cloud to upskill 100,000 civil servants in AI...

BIT gecertificeerd volgens nieuwe ISO 27001 norm

POPULAIRE BERICHTEN

BIT-blogs – Security monitoring bij BIT

Booming Segments of AI Conversational Platform Market 2020-2028 with AWS, Google,...

Okta picks up Auth0 for $6.5bn

POPULAIRE CATEGORIE

Log and temp files

Recente posts

Meest bekeken posts

POPULAIRE BERICHTEN

POPULAIRE CATEGORIE