If you know the size of your typical dataset and the type of data analysis you do, finding the best laptop for data science for you is pretty straightforward. For example,
A) If you, like most data scientists, work with R & Pandas to run data sets that can fit in memory (4GB-16GB) to fit non-deep learning models. Any modern laptop that can be upgraded to 16GB RAM will do. Optionally, you can speed up the process by choosing the fastest CPU (the M2 & M1 CPU chips are the fastest as of 2023).
B) If you work with parallel processing libraries that make use of GPU cores. Ex: deep learning. You want a laptop with 6GB vRAM for NLP (text data) and as much vRAM as possible if working with CV (image data). Not gonna lie a desktop with a 3090Ti would be a better choice for the latter.
C) Optionally, computer clusters (see featured image) allow you to train or process any data (deep learning, neural networks, machine learning, etc) regardless of size and complexity hundreds of times faster than on any computer you can buy. You can use ANY laptop of your choice to connect to these services.
Now, before we get to the best laptops for data science…
I will make a summary of the last section and talk about the ideal hardware specs for data science.
Best Laptop Specs for Data Analysis
The motto for data analysis is “With greater data sets comes greater insights”.
Greater data sets will also require more hardware…usually RAM memory.
RAM’s the #1 most important hardware resource for data science.
As data sets grow larger, RAM becomes the first bottleneck. If you have 2x RAM of your biggest data set, things can speed up an order of magnitude because all your processing is in-memory(RAM).
16GB RAM: bare minimum for data scientists. Not going to find it out of the box on budget (350-600 dollar) laptops but you can always upgrade RAM (some can be upgraded up to 32-48GB).
Faster CPUs are always good but since most CPUs are way too fast, RAM will become the main bottleneck long before CPU comes into play.
Here’s what I mean, if a CPU can process 10**5 pieces of data sets per second and if 32GB RAM can only serve 10**4 pieces of data per second, what’s the point of buying a faster CPU?
Assuming you have the luxury to choose a fast CPU AND you have maxed out on RAM memory:
If working with R & Python, choose the CPU with the highest clock speed (algorithms are mostly single-threaded).
Otherwise, grab a CPU with 8 cores as the limit (this is the maximum on laptops anyways).
The fastest CPU for most CPU intensive algorithms and libraries is the M2 Max MacBook Pro. Any M1 & M2 chip beat Intel & AMDs on benchmark tests.
NVIDIA CUDA: If you WANT to work with deep neural networks or parallel.NN (parallell computing) algorithms that have process IMAGES then you want a dedicated GPU with as many shaders or cores you can afford & as much vRAM as possible. On laptops, this is the 3080Ti (16GB vRAM) but you’ll get your best bang of your buck with a desktop that has a 4090Ti. Note that the data set must fit into vRAM rather than RAM in this scenario.
If you can’t get a dedicated GPU (or can’t afford it) and you’re getting started, don’t sweat it. Most data scientists use cloud services for these kind of processing and you should too.
Storage speed (SSD Type) has little impact, if any, on the data crunching process but if you want to maximize speed when transfering files from drive to drive: the fastest is PCIe NVMe 4.0.
Good keyboards on laptops are not easy to find and usually expensive. If you don’t feel comfortable with the built-in keyboards no worries, just get an external keyboard. External mouse or ball trackpad is a MUST, you dont want RSI and tendonitis.
Min FHD 15” screen: Chances are you’ll be either ssh’ing into a more powerful machine or using the cloud later at some point so the extra screen space becomes super useful to see longer pieces of commands at a time.
Thus if you can afford, pick a laptop that can support a Linux Flavored OS seemlessly like a Lenovo ThinkPad if going for windows laptops since you may use linux entirely (though most will be fine with a Linux virtual machine on a Windows-based laptop).
Top 6 Best Laptops for Data Science
In this list I’ve tried to include a laptop for EVERYONE: beginners, students and Data Scientists (those into parallel programming, machine learning, deep learning and those Using AWS/Cloud Services,etc.)
Just read the descriptions carefully and you’ll be sure to find your best pick.
Best Budget Laptop for Data Science
Core i5 12500H
3050Ti RTX (4GB vRAM)
512GB PCIe NVMe SSD
15.6” 60Hz Full HD IPS
This is a sort of all-round basic laptop for data science.
What I mean by this is that it has ALL the hardware you need for data science (a dedicated GPU and the minimum RAM: 8GB) and it’s not overly expensive as the next laptops we’ll go over.
So this is sort of useful for anyone getting started with data science or anyone who wants to experiment and perhaps even run some ML or parallel process dependent algorithms that aren’t overly heavy (for testing purposes).
Why choose the HP Victus over any other 3050Ti laptop?
Now although there’s a whole myriad of 3050Ti laptops on the market around the same price. There are two reasons WHY this is the BEST 3050Ti laptop for data science:
- It’s got the latest 12th gen Core i5 CPU. Though not a MUST nor a significant performance boost to data science algorithms, it is still a nice addition to the arsenal.
- RAM upgradeability. It supports up to 64 GB RAM!
RAM: 8GB RAM (Up to 64GB)
Leaving upgrades aside, 8GB RAM is enough for simple statistical and ML/DL models of small data sets.
Also enough to run any Data Analysis package and software (R/MatLab/SAS,etc).
Now what do I mean by a small data set?
Anything around 300k rows with 4 variables each or a A data set that weighs around 300MB.
Now…Windows 11 takes around 4GB. Background programs + your software: 1GB. That leaves you with 3GB RAM left, x10 times the size of a small data set.
Maximum Data Set for 8GB RAM
The maximum size would be 3GB.
Which is approx 3G/300MB~10*300k rows=3000K rows with 4 variables.
If you’re getting started with data science, it is very unlikely you’ll work with a dataset this big. If you do encounter datasets much bigger (10GB), then you can use the cloud regardless of what laptop you use.
Q: But but but…I don’t want to use the cloud. I want to use a laptop to run data sets much bigger!!
The solution is simple and this is why I picked the HP Victus over all other laptops.
Since this laptop has a dual-RAM slot and has no RAM soldered to the motherboard that means you can replace the 8GB RAM with two 16GB RAM sticks giving you a total of 64GB RAM.
Which basically lets you work with data sets close to 60GB ! This is equivalent to 3000K*20=60 000K rows = 60 MILLION ROWS with 4 variables each.
If you do happen to run into much bigger data sets (which isn’t likely) , then you can use the cloud!
You’ll have no other choice really because not even desktops support RAM into the 300GB gigabytes. Data sets this big for CPU-based data crunching is extremely rare.
GPU: 3050Ti Parallel Processing
It is no secret that parallel computing tasks like deep learning and machine learning will be faster with more cores and that today (2023) most libraries make use of GPU cores instead of CPU cores. If you’re into any field that uses GPU Cores and have a budget under 750 dollars, I want you to take a GOOD look to the table below.
It should be pretty obvious WHY I listed the slightly more expensive 3050Ti over any of the other ‘somewhat cheaper’ laptops with dedicated GPUs: you get x2 the amount of CUDA cores which means you’re computing performance will be increased two fold. Now that’s only speaking for small data sets for machine/deep learning (4GB vRAM) which will usually be text data or testing smaller batches of image data.
Deep learning/machine learning with IMAGE data sets that go beyond 30GB will always require you to use Cloud Computing services or a desktop as no laptop has that much vRAM.
Fastest Laptop For Data Analysis
M2 Pro Chip 10 core (Up to 12)
24GB Unified Memory (Up to 96)
10 core GPU (Up to 19)
512GB-2TB PCIe SSD
13” Retina (Up to 16”)
Now this is a very controversial laptop but imo it is the best laptop for data science as of 2023.
I’m sure you’ve heard about the M1 & M2 chips being ‘optimized’ for machine learning, deep learning and neural networks and that they are faster than the Intel Chips for these same tasks and this is very true:
- For deep learning, machine learning, neural networks it significantly outperforms “mobile” CPU from Intel/AMD even Core i9 laptops and this is due to the ‘faster and easier’ access to memory (RAM). It is said to be ‘unified’ that means that the GPU has access to all the RAM memory unlike normal laptops which only have access to the memory available in vRAM. This means if you upgrade the MacBook Pro or Air to 16GB or 64GB, you’ll have access to all that memory for GPU-based algorithms like deep learning.
- M1 & M2 chips have more cores than any other Intel CPU and this is ONE of the reasons reason why they’re much faster for CPU-based algorithms. The second reason being that the chips’s architecture have been optimized for “machine learning”.
There’s TWO big catches though:
- For GPU processing purposes, not many libraries are compatible with the M1 & M2 chips. Here’s the applications where it is both compatible and where the Apple Chips outperform Intel & AMD CPUs.
- If using deep learning with a NVIDIA GPU, as long as the memory size is the same. NVIDIA GPUs will outperform ANY apple M1 & M2 chip as the number of cores on modern GPUs are much higher. The only reason some benchmarks like TensorFlow PyTorch seem to run better on the M1 & M2 chips compared to Windows laptops with NVIDIA GPUs is because the lack of memory of laptop GPUs. If you put a desktop 3080Ti with 24GB vRAM vs the M1 & M2 chips with 24GB vRAM, then the story will be entirely different.
So why pick the MacBook M1 & M2 models?
Well, I wouldn’t say you have to pick these models specifically. For most data science purposes (deep learning and GPU parallel tasks out of the question) you could pick older models too. Why? Because they all have:
- The UNIX-like environment. OSX is the most work-efficient OS for data science, just as good as a Linux system if not better.
It’s not just about how python just works best on a UNIX system but also how all the software packages and languages are readily available out of the box but the ease of programming through the terminal too.
- Unrivalled battery life & portability: if you have to ssh into a server. You can be outside of your hose and still check the progress and upload more chunks of data as soon as you get close to a WiFi spot.
- M1 & M2 MacBook Air & 2015-2023 MacBook Pros have a retina resolution which increases the amount of screen space, super useful to visualize data, get a bigger picture of your data & multitask.
If I can afford either the M1 & M2 MacBook Air or MacBook Pro, which one do you recommend?
It really comes down to your hobbies or out of data science activities If you’re only using this laptop for Data Science purposes grab the MacBook Air. If you’re also going to use this for say for video or photo editing, gaming, etc, then you grab the MacBook Pro.
What about data science applications? What’s the difference between the two?
Truth be told, both are better than any windows laptops for CPU-intensive algorithms and computations. The M2 Max MacBook Pro will be the fastest one though. Either way neither of those models have a dedicated NVIDIA GPU which have thousand of cores for parallel processing so if you are working with deep learning then you dont want a MacBook Air unless you’re fine using the Cloud Service.
Best Windows Laptop for Data Analysis
Core i7 12800H
16GB DDR5 (Up to 64GB)
RTX 3050Ti 4GB vRAM
15” FHD IPS
This Lenovo ThinkPad, or all thinkpads, are one of the top 3 choices for data science for MANY reasons. We’ll start with the most important one: Linux Compatibility.
OS: Linux & Windows
Is it a requirement to use Linux?
No, windows works fine too. All statistics platforms like R, scikit-learn, or the many many others avaiable, are available in all three operating systems: OSX, Linux & Windows.
However, unlike Windows, and like OSX, Linux has a easy to use and easy to access terminal. A terminal is a MUST for connecting to other computer infrastructures too. Windows has recently implemented one but the osx & linux terminal are still mainstream so if you’re getting started in the field it becomes even a better option as you’ll easily find tutorials and guides for pretty much ANYTHING you want to do and more importantly virtually every package and algorithm published by developers will always be written for UNIX systems (Linux & OSX).
Most data scientist use Linux for the above reasons and you will at some point move to a Linux system too. If you want to speed up your career in data science, so you might as well buy a laptop that’s 100% compatible with Linux NOW.
This isn’t a problem for OSX which is basically a variant of Linux so you don’t have to a Linux distro you can just it as it is.
It isn’t a problem for most windows laptops either. You can install Linux on any windows laptops including the HP Victus but there’s just one small issue: the trackpad or some of the display features MAY not be compatible with every Windows Laptop. You just don’t know WHICH features will be work or not when you install Linux.
Lenovo ThinkPads is among the few that offer you 100% compatibility and you can even install linux natively in it, that is, you don’t have to resort to Virtual Machines and sitll get every piece of hardware fully working with Linux.
RAM: 8GB-48GB DDR5
ThinkPads can support RAM anywhere from 8GB to 64GB RAM. The upper limit for most however is 48GB.
If you want a 64GB RAM thinkpad, 64GB RAM being the highest laptops can support, you’ll have to head over to the official Lenovo website or buy the laptop I’m featuring here (this is the only ThinkPad, besides workstation thinkpads, that support 64GB RAM to my knowledge).
DDR5 : fastest RAM
We’ve already talked about what kind of data set 8GB RAM and consequently 30GB can support. So let me just add another tip here and it has to do with RAM ‘generation’. You’ll find 3 generations as of 2023 when shopping for laptops: DDR3, DDR4 and DDR5.
The HP Victus out of the box supports 8GB DDR4 but once you do the upgrade you can install 64GB DDR4, there isn’t a significant speed difference when it comes to feed data to the CPU between those two generations however once you work with very large data sets, the difference is SIGNIFICANT.
Now, the HP Victus does not support DDR5 which is the latest generation and the difference is SIGNIFICANT for non-data science purposes and once you talk about data science and large data sets there is even a more SIGNIFCANT performance again.
DDR5 cannot be found on just about any laptop with a 12th Intel Core or 6th gen AMD CPU, the motherboard used by manufacturers HAS to have a pin socket that supports DDR5. In other words, there are very few laptops that support DDR5 with this Lenovo ThinkPad T15 P1 Gen 3 being one of them as you can read in the specification sheet here.
3050TI RTX + Core i7 12800H
Most thinkpads do not have a dedicated GPU. This thinkpad is not expensive due to having the latest RAM (DDR5). It’s expensive due to the Core i7 12800H + 3050Ti combo (not found under 1000).
Now, we’ve already talked about what you can do with a NVIDIA GPU like the 3050Ti but let me add that what can take a couple of hours with this 3050Ti RTX will take MINUTES with a cloud service. That’s only speaking about deep learning, ML, neural networks.
I suggest, regardless of how much RAM & GPU power you have on a laptop, you start using a cloud service every now and then as it not only looks good in your resume but also help you finish gigs much faster because it gives you the ability to run a lot of experiement with your data by getting you results FAST.
As for the CPU, this is one of the three fastest CPUs found on Windows laptops , it’s the latest generation and for CPU intensive tasks it’s a nice upgrade however it is still slower than the M2 MAX MacBook Pro chip.
Best Cheap Laptop For Data Science
Intel Core i5-1215P
Intel UHD Graphics
14” FHD IPS
WiFi 6 802.11AX
This is the best basic laptop for data science and the one I’d recommend buying for those on a BUDGET that want to try to get into this field without investing much money, or as they call it: test the waters.
RAM: 16GB RAM
Most data scientists do NOT work with large data sets nor do they see. the need to have more than 8GB RAM. Most work with statistical models, have to output grahical results through Panda and so on. The few times they deal with parallel processing tasks that also require tremendous amount of RAM, most just use the cloud really.
This is the reason why an 8GB RAM laptop isn’t such a bad idea. You can do pretty much all types of programming and teach yourself data science with 8GB RAM, testing libreries and small samples will not require more than that.
How about the CPU is it good enough for data science?
Yes it’s true this is not the fastest CPU and probably the slowest CPU released in 2023. However, given the budget constraints and the fact that its only 400 bucks (more or less), it’s plenty fast AND it’s of the latest generation: 12th. Most laptops under or around the same price will still have a Ryzen 3 or Core i3 from previous generations.
However, a fast CPU isn’t a deal breaker for data science either. I talk about it in the last section but basically there isn’t significant gains going for faster CPUs unless they have more CORES and the application you use has to make good use of extra CPU cores.
TL;DR: this CPU is plenty fast and faster for those getting started with data science.
Portability & Display : 3lb + FHD resolution
The main reason WHY i picked this laptop over all the other budget laptops is not because the CPU belongs to the 12th generation. It’s because it’s the most portable machine (3lb) around this budget AND in fact, the weigh is pretty much the same weigh you find on expensive ultrabooks like the MacBook Air. In addition to that, a FHD resolution display, a MUST for multitasking (having a SSH window – terminal + a tutorial/data rows), something you are very unlikely to find simultaneously with the weigh and 12th gen Intel CPU.
Best Laptop for Data Science – Parallel Processing
32GB DDR4 (Up to 64GB)
NVIDIA GeForce RTX 3080Ti (16GB vRAM)
1TB PCIe NVMe SSD
16″ WQXGA (2560 x 1600) IPS 240Hz
This laptop has as much hardware you’re going to get out of personal computers for data science. Everything here is maxed out: CPU, RAM, GPU, Storage. Although the latter is irrelevant if we are talking about fast data crunching it becomes useful if you want to download and upload extremely large sets of data.
Only gaming laptops & workstation laptops will have these kind of hardware specs.
CPU: Core i9 12900H or Ryzen 9 6900HX
The latest & most expensive workstation or gaming laptops will either have a Core i9 or Ryzen 9, out of the two, as long as both are the latest they work equally well if we are talking about performance for data science CPU-dependent algorithms.
The advantage of going with the latest CPUs is that you automatically (at least 90% of the time) get support for DDR5 RAM and the motherboards support 64GB RAM as well. You may get 128GB RAM out of workstation laptop if you look long enough though they sell at extremely large prices and to be honest once you work with +150GB data sets or around that number , CPU power will start to become a bottleneck for fast data crunching and only cloud services will give you fast results.
GPU: 3080Ti 8GB vRAM (or 16GB)
I have included a table of the graphics cards you’ll find on modern laptops and as of 2023 (see the GPU section below) and the one with the highest number of CUDA cores is the 3080Ti with a whooping 7000 CUDA cores. Now some applications make use of ‘vRAM’ especially image processing applications and some 3080Ti laptops may only have 8GB RAM despite having the same number of CUDA cores.
If you’re going to work with a variety of applications in data science: image processing + deep learning, then you want to double check you get the 16GB vRAM version. 16GB vRAM is a pretty good start for actual meaningful deep/machine learning results with real-world data, usually datasets for these two areas (images) will be much larger though.
How To Choose the Best Laptop or Desktop Computer For Data Analysis & Data Science
What you’ll learn in this section is how to get as much computing power out of a desktop or laptop for data analysis. This is going to help you maximize the specs/money ratio and thus help you find the best laptop for data science for a given budget.
I’ll go over the basics and talk about the most common software & type of data analysis and summarize the whole section first.
You can either read it or jump into the hardware specs: CPU , RAM, GPU, Storage, Display, etc, sections if you’re interested in a particular spec.
Two ways to do Data Analysis
A) Using the Cloud
You should learn how to use the cloud whether this is solely going to be the way you do the data processing or not.
Using the cloud means renting a server (for computing) that can do all the processing. Cloud services now use a cluster of computers to do all the processing hundreds of times faster than on a personal computer.
For example, Amazon Web Services gives access to on-demand EMR multi-machine clusters per hour including all of their data stores like ElasticSearch, Redshift, etc.
How to use a cloud service?
You just need a 4-8GB RAM laptop with an internet connection and a terminal to ssh into (Chrome OS does not have a terminal). Extra battery may be helpful if you want to use a laptop to check progress away from home (laptops with 4-8GB RAM usually have decent batteries due to low CPU power).
It is not uncommon to solely rely on the cloud. People start with hadoop clusters first then move on to more general use cloud services or computer farms but you can jump right into computing farm services like AWS.
The most common way to use a compute farm is to download a small sample and test it on a laptop then input the full data set into these computer clusters.
B) Personal Computer
The most powerful personal computer for data analysis is going to be a desktop with:
- A high clock speed & multi core CPU (multi core AMD CPUs have better specs/money)
- 128GB of RAM.
- SSDs in a RAID set up.
- GPU with the highest vRAM & CUDA Cores available. As of 2023 this is the 4090Ti with 24GB vRAM.
You can have somewhat of your personal server too, the cheapest ones will be older machines (nonetheless since they are a clusters they’ll be faster than your average desktop) they can be found on:
- Amazon, Ebay or any other e-commerce site.
- Data science Facebook groups: some people will post their set ups for sale.
It isn’t uncommon to find 32-core CPU with 64 GIGs of RAM Linux server for 400 bucks.
Software & Hardware Specs
Before we get into hardware specifics: CPU, RAM, GPU & SSD for data science. I want to talk about the most common software and libraries and briefly mention hardware specs that help because as mentioned before, some workflows (software & algorithms) will find some specs more user than others .
Data science students use a combination of the following software/languages:
Most of these are just libraries, any laptop with 8GB RAM (or less 4GB if you use Linux) can run programming languages and libraries with no issues.
Plus there isn’t going to be any big data crunching (mostly sample tests) and if there is, you will have free cloud services (or university computers you can SSH into).
The only struggle you’re going to have is having R, Python with all its packages installed on a laptop, it takes a WHILE to do the whole process error-free.
My first time doing the installation process took me a week, today you don’t have to spend a week (maybe a day at the most) there are plenty of tutorials and guides on how to do this fast and efficiently.
The whole process is far easier on Linux systems , followed by MacBooks and Windows.
If you’re a student, I’d recommend OSX (Apple) to get you started. They are the perfect balance between easy-to-install package ecosystem and easy-to-use OS, if price is an issue you can buy the older models which sell for as low as 300 bucks and a bit more if you want more RAM (500 for 16GB). All older models will work just as good as the newers ones because OSX (the operating system ) is regularly updated.
B) Data Scientist
Once you add Hadoop to your arsenal that means you’re going to run data sets in the GBs range and this is where hardware specs become crucial.
I’m sure you’ve read about the three types of problems in data science: volume, velocity or variety.
Well, Hadoop is mostly a volume & velocity problem and this is why most people use a cloud service.
This post is about laptops so it assumes your datasets are relatively small (less than 20 gigabytes for images and less than 64GB for text).
A small data set can said to be “anything that can fit in RAM memory” . If you have a data set larger than 50-100GB, that usually means you have to use distributed computing even if you just want to perform simple calculations.
Most data scientists (especially those getting started ) deal with “variety problems” that means a bit of everything and in this scenario data sets are small where most laptops or desktop becomes with 16-32GB RAM and/or 4GB-8GB vRAM GPUs (for deep learning & machine learning samples) will be enough.
Machine & Deep Learning
In the case of ML, results are highly dependent on the size of data. That means more data equals better results. More data means you’ll need more storage but since all the processing is done in RAM, that means as RAM and vRAM are the #1 most important features. Say if you have a 16GB data set to train then IDEALLY you want a 16GB RAM & 16GB vRAM laptop. As for CPU, your focus should be on GPU cores rather than CPU cores.
If you use R (Ex: RevoScaleR package) most packages and libraries will be RAM & CPU dependent. That means vRAM & GPU are useless.
However, the main bottleneck is still disk I/O and RAM memory. In other words, you will run out of “RAM memory” before you need more cores.
Given the constraints of most R algorithms and the physical constraint on laptops, I’d say 8 cores is a good ‘maximum’ number of cores for data science with R .
Hadoop is an innovative way to see models with limited data. It was invented because of the technological (hardware) constraints of computers given the large size of data in the past.
How does it work?
Well we know machine-learning algorithms output better results with more and more data (particularly for techniques such as clustering, outlier detection and product recommenders) thus in the absence of computer resources, a good approach would be to use a “small sample” of the full data set ( The small sample being basically whatever amount can fit in RAM) . Then run the algorithms with this small smaple so you can get results on full datasets without having to sample ALL of it.
The way you do this is by writing a map-reduce jo (PIG or HIVE script) then launch it directly on Hadoop over the full dataset then get the results back to your laptop regardless of how big the data set is.
Hadoop ALSO has a linearly scalable storage and processing power mode which lets you store all the data set in raw format and run exploratory tasks so you can get results of the full data set.
However data scientists usually just use a laptop with limited RAM and CPU/GPU power to test a small sample and then use cloud services to run algorithms on the full dataset.
Panda is mostly used to read CSV and Excel files for cleaning, filtering, partitioning, aggregating and summarizing data to produce charts and graphical representations of data. This doesn’t need any special hardware, any laptop can do that even older cheaper models (as long as you install Linux).
Even if you’re working on an application that requires fusion of large tables with billions of rows to create a vector for each data object you only need to use HIVE or PIG scripts which again runs on pretty much on any laptop.
If you want to train a heavy neural network then that’s not something you can do with a laptop or any laptop because the repeated measurement analysis (consequently the increase in variance covariance) will make most computers run out of resources. Here you either want some sort of super computer (a server) or use cloud services.
Hardware & Data Science
From here on we’ll talk about how each piece of computer hardware affects the speed performance of data science algorithms & software.
This is the single most important component for Data Science applications. Luckily, it is the easiest spec to upgrade and the cheapest spec buy.
RAM for Basic Data Analysis
The CPU can process data WAY WAY WAY faster when it’s temporarily stored on RAM memory rather than Hard Disk Drives and Solid State Drives.
Take for example, information is written on the front page of a piece of paper. If you try to read it with the paper being backwards, it’s going to take you a lot more time to decipher what the message is and if you place the piece of paper 3 feet away from you its going to be even more challenging.
Having the paper faced forward and only a feet away from you makes it EASY for YOU (the CPU) to read. This is what RAM feels like to the CPU, data being close and properly positioned.
Having the paper backwards and 6 feet away from you makes it SUPER hard for YOU (the CPU) to read. This is what’s like to have the CPU read and process data when it’s NOT stored in RAM but rather stored on your storage device (also called scratch disk).
Otherwise, there will be a queue or the data will be processed from the storage drive and that’s going to make things very SLOW.
Data Set Size – Text Data
How much memory do we need for various size of data sets?
Experience tells me that 30% will be happy with 4GB, 75% with 8GB, 85% with 16GB and 95% with 32GB and 100% will be happy using the cloud.
4GB RAM: is enough for a small data set
A small data set will take approx 300MB. This is equivalent to a set of 100,000 to 200,000 rows with 200 variables.
Assuming you only work with much data AND you are NOT going to do something more CPU intensive like trying to visualize ALL of the data at once, a 4GB RAM laptop like those found on the older versions of the MacBooks will do. You only need to spend 200 bucks here!
8GB RAM: Good for Medium-Large Data Sets
A data set that’s about 25 times bigger than the small data set can be considered to be ‘large’ for personal computer purposes.
This is equivalent to x25 * 200,000 rows w/ 200 variables which will barely fit in 8GB RAM (due to the OS & background software taking almost 4GB).
There are ways you can SQUEEZE thus PROCESS that much data despite lack of RAM though but you need really good data analysis/scripting/programming skills.Since this is a good skillset it’s something you should do always and ASAP.
Regardless of how big yoru data sets are, I highly recommend EVERYONE get 16GB RAM as the bare minimum for a speedy workflow.
Because good things happen (massive performance gains) when yo have x2 RAM of your largest chunk of data. Upgrading a laptop’s RAM is EASY so you should do it RIGHT NOW if you have a laptop that seems to be “slow” BEFORE you buy a new laptop.
Just how much performance gains are we talking?
A large data set that cannot fit in 8GB RAM (due to only 4GB being available for the data set) will take 4 hours as opposed to 20 minutes with 16GB RAM (where a large part of the data , or the entire data, can fit in RAM).
Q: How is 16GB going to help me if I only work with small data sets?
If you have a dataset of 2GB, obviously having 8GB RAM will be enough (8GB means you have 4GB available for data crunching). However, having the extra RAM, means you will spend less time being ‘careful’ on how the data is presented and how to use a new variable to store a permutation of the data. Lastly,
Another reason is to try and run the algorithm with multiple versions of the same data which may be somewhat heavier thus requiring more RAM.
Q: How much RAM do I need for MY dataset? How do I find out?
First, open YOUR data set.
1. Use CTRL+ALT+DEL to open the task manager.
2. Click the Performance Tab–> Memory.
3. Check the “memory” and “virtual memory” columns.
Write down that number and multiply it by two then add the OS OS Overhead (4GB) + Apps (~500MB).
Q: Why does everything in my computer run slow with large data sets?
Because you don’t have enough in-RAM memory. When this happens OS will start to “thrash” , this means it’ll remove processed from memory to let others run. Some of the removed items being crucial for performance
Q: But Quora told me RAM doesn’t matter!? My laptop can still run large data sets regardless of how much RAM I have…
That’s somewhat true.
For example, if you have 6GB dataset.
You can run scripts on the dataset with 4GB RAM IF you divide the dataset into smaller batches and process them separately. Later, you can combine results and you’re done.
On the other hand, if you have 12GB RAM with a 6GB RAM data set, you can process the whole thing in one go. This will obviously be much faster.
Q: What about the Data Preparation Process?
Data preparation can reduce the need for more RAM.
What’s data preparation?
Data scientists have two set of skills: preparing big data (usually in disk processing through Unix Grep, AWK, Python, Apache Spark,etc) AND in-memory analytics (R, Python, Scipy,etc) programming skills.
When your data sets are small or your have way too much RAM, YOU DON’T NEED TO know how to prepare data.
It will only become relevant when working with text analytics where the amount of input data is naturally big .
RAM for Deep Learning
Deep learning will exclusively need vRAM, this is the memory on the GPU (we’ll talk about that soon) if you want acceptable performance. You could in theory do deep learning with the CPU & RAM but it will be extremely slow compared to what a GPU & vRAM can do.
Now that doesn’t mean it’s useless for Deep Learning. You still need RAM because that’s the first place where the data is moved to from storage before being moved to GPU memory. In other words.
Data Set —> Download From Internet —> Storage —> RAM —> vRAM
Thus if you have to work with a 16GB data set, then you need 16GB of RAM and ideally 16GB vRAM as well if you want high performance.
RAM for Neural Networks
The principles of neural networks are based on deep learning thus most tasks are more efficient with vRAM as opposed to RAM. You can use the same though process when buying RAM for Neural Networks as you do for Deep learning.
RAM for Machine Learning
Machine learning will most of the time need vRAM instead of RAM too. However, some algorithms will be more efficient with RAM such as those algorithms that depend on large amounts of memory.
RAM for Computing Cluster( the Cloud)
Using any cloud service does not require extra RAM, you may only need 8GB RAM for the operating system (Windows) to run fast, a good wifi card or an ethernet port AND at least a good sized storage drive if you’re planning to upload very large data sets from your computer to the cloud.
I would recommend 16GB (upgrading it) so you can create a resonable amount of test data to use on your desktop or laptop first before uploading it to the cloud.
2. CPU (Processor)
CPUs for Basic Data Science
For basic data science and CPU based algorithms , being picky about which CPU to get doesn’t matter. Yes, you will get better performance with faster CPUs but nothing that will be tremendously significant. As long as you have enough RAM, you should get relatively the same speeds.
Now assuming you want the “BEST” of the best, yes, choosing a faster CPU will speed up any data crunching process but you have to focus on clock speed instead of cores.
Quick CPU Lesson: What are cores and what is clock speed?
#Cores:Modern CPUs (from 2000 onwards) are not made out of one chip that does all the processing but rather 2-8 chips which when the application needs it all becomes useful for processing. Ex: A quad core CPU is like having 4 researchers working on a problem instead of just one.
It’s just common sense that if working on a project or task, the more people you have the less time it’ll take to finish the work right? Well that’s not always the case since you still have to wait for results before working on the next step thus having more people does not necessarily accelerate tasks.
Some tasks and processes do benefit from more reseachers or ‘cores’, these tasks are said to work ‘in parallel’.
Most tasks in data science (at least when you get started) are single-clock speed dependent, that means, they’re only going to use ONE core. Thus the speed of the CPU (hence of any CORE) is the most relevant CPU spec when it comes to look for a CPU.
The table below shows you the most common CPUs you’ll find on laptops:
|Ryzen 9 6980HX||
Ryzen 9 6900HS
|Ryzen 7 6800HS||
|Ryzen 9 5900HX||3.3||4.6||8|
|Ryzen 9 4800HS||2.2||4.4||8|
|Ryzen 7 5800H||3.3||4.4||8|
|Ryzen 7 3750H||2.3||4||4|
|Ryzen 7 5800U||1.9||4.4||8|
|Ryzen 7 5700U||1.8||4.3||8|
|Ryzen 7 3700U||2.3||4||4|
|Ryzen 5 5600H||3.3||4.2||6|
|Ryzen 5 3550H||2.1||3.7||4|
|Ryzen 5 5500U||2.1||4.4||6|
|Ryzen 5 3500U||2.1||3.7||4|
|Ryzen 3 5300U||2.6||3.8||8|
|Ryzen 3 3300U||2.1||3.5||4|
I’m including CPU from four generations here you’ll find them on modern latpops even those brand new in 2023.
Do note that the “base” & “turbo” columns quantify the maximum clock speed.
Notice how the clock speeds are very close to each other despite some CPUs being way more expensive and more recent.
For the average basic work done on a laptop for Data Science all of these clock speeds are fast enough.
If you want to maximize CPU power for parallel scripts & algorithms in data science, you only have to pick an 8 core CPU as some algorithms and tasks will take advantage of CPU cores (Up to 8 on average) if they do not support GPU core usage.
For Intel & AMD CPUs: Which Clock Speed is good?
The reason why you should not worry too much about “clock speed” to speed up data science tasks is because you will run out of RAM memory before you need more clock speed.
Larga Data Set Example
Say, you have to run calculations on a 128GB data set and you fit all the data on 128GB RAM, since you can fit all the data into RAM then yes having a faster CPU will speed up process but laptops only support 48GB RAM so having a data set higher than that will not speed up things.
Low Data Set Example
Now if you have a low volume dataset (8GB) and a total of 16GB RAM (thus fitting all data in RAM) a faster CPU will make data processing faster HOWEVER not by that much due the clock speed differences being small (4.4GHz vs 4.0GHz). Something that might take 15 min with a 4.4GHz CPU in this scenario will take 13 min with 4.0GHz, is it worth paying an extra 200-300 dollars? It’s up to you.
M1 & M2 Chips
Benchmarks show that the M1 & M2 Chips outperform pretty much ANY Intel or AMD GPU. This is true due to high clock speed of these chips AND the extra cores. Also due to the RAM being more efficient and faster than coventional RAM found on windows laptops.
Cluster Computers & Cloud Services
Cloud services and cluster computers have an almost unlimited amount of RAM and their CPUs clock speeds are probably twice as fast. This is the only instance you will see SIGNIFICANT performance gains when it comes processing data.
Machine Learning, Deep Learning & Neural Networks
These rely on CUDA Cores found on NVIDIA GPUs rather than CPU clock speed. This is why computer clusters mostly focus on extremely fast GPU stacks with lots of cores & vRAM.
There are few exceptions with Machine Learning, we’ve talked about those in the RAM section above.
GPUs are mostly useful for Machine Learning, Deep Learning, neural networks & image processing.
That doesn’t mean you will not find them useful outside of those fields.
As of 2023, parallel processing is finding its way in more and more data science applications each passing year so a dedicated graphics may be somewhat of a good investment for basic data analysis thus what’s said next MAY apply to those applications too.
NVIDIA vs AMD: CUDA Cores
All of this GPU hype is limited to graphics card from NVIDIA because that’s what developers work with when writing algorithms AND NVIDIA also designs their graphics card to find its way in the data science industry.
Although you can use AMD GPUs and perhaps any other GPUs for machine learning , deep learning ,etc, it may not be a 100% solution and you may have compatibility issues with MANY packages and scripts.
You can read all about how NVIDIA GPUs are useful and why AMD GPUs may not be as useful in the following article: “The best graphics cards for Machine learning“. But again it basically comes down to “CUDA” core technology being considered by developers at the time of writing and developing scripts and algorithms. Likewise, NVIDIA also designs their GPU’s architecture with these applications in mind.
Libraries & Packages with CUDA Compatibility
Pretty much all of the deep learning libraries and most machine learning libraries (tensorflow & torch) use CUDA cores from NVIDIA GPUs.
In fact, deep learning algorithms ONLY use the GPU instead of the CPU. Due to this, algorithms that took a week in the past now take less than a day with a CPU. Image Processing has been using GPUs since it’s infancy but now it is way more efficient at it.
Some algorithms and packages in Machine learning may use the CPU as we discussed before.
Q: So exactly which Data Science Software/Service/Tools make use of NVIDIAs CUDA core technology?
Within deep learning & neural networks ALL algorithms and libraries makes use of CUDA core technology EXCEPT legacy software. As for machine learning about 90% of libraries and packages are GPU-based.
Concerning applications outside of these fields you should double check whether or not your library or set of tools make use GPU or support parallel processing. Many people think using a GPU or using a cloud service that has a stack of GPUs (Ex: AWS) will massively speed up computation with parallel processing only to find out it doesn’t.
Q: We are talking about desktop GPUs right? Laptop GPUs are useless.
You’ve probably came across some articles claiming laptop GPUs are useless because they are much much weaker than desktop GPUs and that’s partly true but in no way that implies that laptop GPUs are useless for data science.
If you came across an article claiming laptop GPUs being useless , it was probably written 15 years ago.
Today’s GPU (in fact even since the 10th generation line of GeForce GPUs , 2017 or so) are pretty much the same kind of GPUs you find on desktops EXCEPT that they have their TDP reduced because laptops cannot accomodate a decent cooling system to allow the GPU to hit its highest clock speeds. But this reduces their performance up to 30-50%. Thus if you want the best performance for GPU parallel processing tasks you want a desktop GPU over a laptop GPU.
How to Pick a GPU: vRAM & CUDA Cores
vRAM & DataSet Size
As long as you have approximately the same amount of vRAM as the typical size of your dataset then processing speeds will be fast.
For example, if you have a 8GB dataset, you want a GPU with 8GB vRAM memory as a minimum. This is a small size compared to real world useful data which are in the 50-100GB and for which either desktops become ideal or computer clusters.
CUDA Cores & Data Crunching Speed
Once you fit all your data into vRAM, the GPU with more CUDA Cores will be the fastest one. For example, in the table below there are several GPUs with 4GB RAM, out of these the 3050Ti will be the fastest for deep learning due to have almost 2x the amount of “procesing units”.
Which GPU to pick?
It depends on what your focus is RIGHT NOW:
A) In the scenario where you’re getting started with deep learning/machine learning through guides, videos and tutorials. Usually tensorflow algorithms, then there’s no need to compile image net or visual models on your GPU. This means a 4GB vRAM GPU like the 3050TI or any of the 4GB vRAM GPUs should be fine.
B) If you want to work on your own project, you will run significantly larger models. If the your project is simple, then you should be okay with a laptop GPU. If you can afford it I would recommend a 8GB-16GB vRAM GPU (for laptops).
C) For large scale projects for research and companies where products are being developed for a company. You have to use a computer clusters which have GPUs like the NVIDIA A100.
M1 & M2 vs NVIDIA GPUs: Deep Learning & Machine Learning
In the video above it may seem that the MacBooks here beat the RTX chips for Machine & Deep Learning however this is only due to the lack of vRAM of RTX GPUs found on laptops. The MacBooks have “unified memory” that is both the CPU & GPUs SHARE the same RAM (there is no vRAM on these MacBooks) thus easily outperforming RTX laptops as shown in the video.
Now the test can be carried with smaller data sets as shown below:
In the scenario where the datasets are small enough for RTX GPUs , the M2 Chips will outperform every NVIDIA GPU.
64GB Unified Memory vs 24GB vRAM Benchmarks
The problem with the above benchmarks is that the models are too small and in the instance where they are large, the MacBooks have more ‘vRAM’ so to speak thus the comparison is not fair.
Currently, the M2 MacBook supports up to 64GB unified memory, that means you can train datasets that weight that much through the GPU for machine or deep learning. I have not carried the benchmarks but I would expect the M2 Max Pro Chip GPU with 64GB Unified to beat the 3080Ti Laptop GPU with 16GB vRAM.
However, do note that the M2 MacBook with the M2 Max Pro chip is extremely expensive and you can get a 24GB vRAM GPU (4090Ti) at a much cheaper price (this one would definitely outperform the macbook M2 Max Pro chip) for datasets around that size.
Size: 256GB min
Obviously, if you handle datasets in the gigabytes range and store them on your hard disk drive, you want at leasta 256GB or even a 512GB.
If we are talking about reducing the time it takes to process datasets, how much storage you get doesn’t matter.
Type : Solid State Drive
For data processing, choosing a solid state drive (or the fastest solid state drive PCIe 4.0) will somewhat speed up the data crunching process because solid state drives can ‘feed’ data faster to RAM and CPU.
If we are talking about transfering data from storage to RAM, the difference is MINIMAL between all types of Solid State Drives.
In the scenario where your data set is much bigger than the RAM you have available and the CPU has to resort to do the processing straight out of the storage drive, then yes choosing the fastest Solid State Drive (PCIe 4.0 as of 2023) will make a significant difference.
However, the performance of CPU-RAM data processing is WAY faster than CPU-SSD data processing so you WANT TO AVOID running out of RAM.
Conclusion: the fastest solid state drive PCIe NVMe 4.0 isn’t something that’s a MUST for data science. As long as you. get ANY solid state drive you should rip all the usual benefits of SSDs (boot up your machine in 5 seconds, instantaneous code look up, launch software in seconds, etc).
A) Using the Cloud
If you’re going to use the Cloud because your datasets are extremely large (100GB range) then you want to pick an SSD with as much storage as you can afford.
A 512GB SSD is a good start. As for uploading data to the cloud, it won’t be any faster with the fastest SSD. Ex: choosing a PCIe 4.0 won’t make uploading 100GB to the cloud any faster than a SATA III SSD.
Size & Resolution: 15” FHD min (QHD if possible)
It’s not a requirement to have a large screen but it definitely helps when you work with large data sets and you want to get a bigger picture of your graphs and data rows. It also makes it easier to SSH into more poweful machines/cloud services.
But that’s not the reason why I’m pushing for a high quality display. It’s more about YOU having to stare at the screen for several hours a day if not the entire day (at least when you’re getting started) and having a decently sized display with high resolution will make it MUCH easier on yoru eyes and less likely to give you eye issues (and strain) in the future.
External Monitor: Optional
If possible, you should get an external display (assuming you don’t have a desktop back home) if you’re also going to work back home or you work mostly at home as this will make it MUCH MUCH easier on the eyes and MUCH MUCH easier to work with tutorials ESPECIALLY if you use both your laptop monitor + external monitor (or two external monitors).
All laptops support external monitors because all of them have an HDMI port and/or display port.
6. Cloud Services (For Newbies)
Cloud computing is basically paying for computer clusters to do the data crunching. These cloud services generally have thousands of computers, each with more RAM than what a desktop can support and several processors on each.
These computers go by the name of servers which implies they are specifically made to run a specific set of tasks. Ex: Running a file system, running a database, doing data analysis, running a web application, etc.
Since they have nearly unlimited hardware resources, this is the way to go if you have a data set in the range of 100 GB and up.
In fact, it’s a good choice for anything about 1GB if you would rather NOT spend money on a new laptop. It will always turn out to be cheaper than buying a new computer. Linode, AWS, Microsoft, and Digital Ocean sell incredibly cheap compute power.
As for myself, I have a subscription on two of these : Digital Ocean and AWS. The money I’ve spent is close to nothing compared to what I would’ve spend with a 128GB RAM desktop.
AWS (Amazon Web Services)
AWS is currently the biggest company in the Market of cloud computing services.
Sooner or later you will have to use AWS or a similar cloud service.
If you planned on doing more intense stuff (Neural Networks, Support Vector Machines on alot of data) even the most powerful desktop GPU will not cut it and you will be better off renting out an AWS instance.
Note that AWS has a free tier for you to get stated with so you’ve got nothing to lose at this point.
Like I said, it’s not just about the need for unlimited computing power but also the fact that this is a SKILL you must learn if you ever want to land that 250k a year salary.
Using VNC (Virtual Network Computing)
You basically build the ideal (i.e. powerful) data analytics computer desktop.
Then you buy any cheap laptop of your choice, keep that powerful desktop running and use remote access software like TeamViewer, ANyDesk, TightVNC .
The problem here is that things are still going to be slow if your data sets (image) lie in the 100GB or more unless you buy a stack of NVIDIA GPUs (maximum is 24GB so you would have to buy x4 NVIDIA 4090Ti). If datasets are much smaller (>30GB), then it is a good option.
Amazon AWS EC2
I actually did the above but the problem was I started working with image data into the 50GB+ range and things started to slow down MASSIVELY. I got fed up resorted to use Amazon AWS EC2 for deep learning/machine learning ever since.
This service is very similar though. You make your own virtual computer with any OS of your choice and software of your choice. You could go as far as making it your only device of work.
For example, I installed a web based IDE for R on it (Rstudio),then went to the site that hosted the EC2 server and used R as if it was my very own personal computer.
Thus whenever I wanted to work, I could do it through any computer with an internet connection by simply visiting the site leaving all the processing to the server.
Cost: Depends on your choice for processor, RAM, GPU. Currently, there’s a 1yr trial which let’s you use a server at no cost (though with the lowest specs out of them all).
Work with the server through any device with an internet connection and a keyboard.
Files are easy to access. No need to download anything just use and view them through the server.
Much less expensive than a powerful laptop
Server can be programmatically designed to scale depending on analysis needs using an API
If your laptop screen is small, you will struggle. It’s best to use a 15” or 17” screen if working from a laptop.
If your internet connection is slow, then your workflow will be slow too.
Can take some time to adjust.
7. OS: Mac vs Windows vs Linux
For some it may seem like only Mac and Linux are the way to go. But it’s all down to preference anyway. Most of the packages you will need work across all plataforms (Octave and R are good examples and have been availvable in all OSs for ages).
Using Python on UNIX devices (both OSX & Linux are UNIX) is much easier due to better access to packages and package management.
Since Python is one of the most widely used languages for Data Science, you may think these two OSs are your best option.
That’s true also because you’ll have quick and early access to the latest libraries. That doesn’t mean you should not buy a windows laptop because you can install Linux on any windows laptop.
If you use do use Windows on a Windows laptop then you will have to wait for libraries to be compiled as binaries though.
If your solely working with Windows even with the new terminal on Windows, you will still need to do a lot of tweaks to set up all your algorithm and scripts for data science especially for sporadic and third party libraries whose documentation will be solely written for UNIX systems. The most widely used libraries and scripts for MatLab, S-Plus and SPSS, Python, Pandas, all the machine learning/deep learning algorithms, databases: PostgreSQL/MySQL will have a windows version and nice documentation for Windows though.
Note that Im NOT referring to windows LAPTOPS, I’m referring to the OS.
Cheaper Hardware, dGPUs and more
Windows laptops will give you the cheapest hardware and more powerful GPUs and more RAM than MacBooks (128GB on workstation windows laptops vs 64GB on the latest MacBook).
Unlike Macs, you can upgrade RAM on any windows laptop (Up to a limit predetermined by the motherboard).
If you have any questions, questions or any suggestions. Please leave a comment below. Your input is taken seriously in our posts and will also be used for future updates.
- I am physicist and electrical engineer. My knowledge in computer software and hardware stems for my years spent doing research in optics and photonics devices and running simulations through various programming languages. My goal was to work for the quantum computing research team at IBM but Im now working with Astrophysical Simulations through Python. Most of the science related posts are written by me, the rest have different authors but I edited the final versions to fit the site's format.