- »
- Conferences »
- PyconDE2022 »
- 5 Steps to Speed Up Your Data-Analysis on a Single Core
-
5 Steps to Speed Up Your Data-Analysis on a Single Core
- Author: Jonathan Striebel
- Talk: https://2022.pycon.de/program/VYS8XY/
Why speed up?
- Only speed up when it's too slow
- Don't overoptimize berforehand.
Why speed up on single core?
- You may not have parallelization
- Needs resources [cores, memory, money]
- Code may be difficult or sometimes impossible to parallelise
- When you optimize for single core, pays off when you parallelize
5 Steps
1. Profiling
- Most important step
- Find what is slow part
- Tools
- yappi
- py-spy
- cprofile
- pyinstument
- palanteer
- All tools except py-spy instrument python code and may make slower
- py-spy is based on rust and is sampling based.
2. Efficient IO
- Use binary files instead of Text based
- Text bases
- CSV
- JSON
- YAML
- Binary
- Hdf5
- npy
- parquet
- pickle
- sqlite
- zarr
3. Vectorization
- Use Numpy built-in methods
- Pandas may have some bottlenecks. Alternatives
- Polars: https://github.com/pola-rs/polars
- Modin [Multi-threaded]
- Vaex
- Dask Dataframes
4. Memory and Precision Tradeoff
- Change data types
- Sometimes you may not need precision or large values
- int32 vs int64
- float32 vs float64
- Use iterative methods
- e.g. divide and conquer
5. Jit-ing with Numba
- Use decorator
- @numba.njit