Climate Data

Optimising analyses

Overview

Teaching: 10 min
Exercises: 5 min
Questions
  • How can I make my analysis faster

Objectives
  • Explain the basics of optimisation

  • Explain memory limits and lazy loading

  • Explore Dask and chunking

No-one likes their code to run slow. There are a number of ways you can improve performance

Reasons your calculations can run slow:

Reduce your workload

The fastest way to do a calculation is to not need to do it at all. Has someone else done the same calculation?

Use appropriate input data - don’t get 3-hourly when monthly would work fine

If needed subset data before other operations

Memory and Caches

When working with big arrays the order that you read memory can be important. Computer memory is (to first order) linear, arrays are flattened in memory. - it’s quick to access data close in memory, but slow to access data far away. Data is read into fast cache in chunks

Prefer whole-array operations to creating your own loops

As much as possible do the same operation on every grid point, avoid conditionals

Example: Loop access patterns

Let’s look at different ways of iterating over loops

Notebook

Full memory

When your computer’s memory is full, there are a few things that can happen

You may hear your computer’s hard disk churning Or processor usage takes a nosedive

Reading and writing to disk is much much slower than RAM

Free up memory after use - Python will automatically free memory for you when exiting functions (garbage collection), in some other languages you need to manually free

Operate on less data at once - rather than load a bunch of files to average them together, load them one at a time, close them when done

Xarray can also automatically split data into chunks, only working on a bit of the whole field at a time * Will also do some parallelisation, but can only read serially, so won’t make reads faster!

Netcdf also supports chunking in the file, can be optimised for different access patterns (balancing quick time access vs. quick spatial access)

Example: Chunking 0.25 degree ocean data

What are the benefits of chunking with really big files (not using OPENDAP here!) 7 GB file /g/data/gh5/access_cm_025-picontrol

Notebook

Use chunking to plot the mass transport through the Drake Passage

Use the tx_trans_int_z variable

See if you can combine chunking and a multi-file dataset to plot multiple years

Key Points