
Image by Author
# Introduction
Keeping up with data science is not always easy. Every day there are new libraries, papers, datasets, and tools, and I can’t remember them all. I found that just following newsletters or threads doesn’t really work. What helps more is having a few go-to resources ready. For me, it’s like a small hub where I keep research, coding stuff, datasets, visualizations, and quick references all in one place. After trying a bunch of things, I now have 10 bookmarks I use all the time. They help me stay focused, save time, and know what’s happening. Every morning I open them and they kinda set the tone for my day. Here’s a look at my top bookmarks and why I keep them:
# 1. arXiv: Machine Learning (cs.LG) New Papers
arXiv is where I check the latest machine learning research. The cs.LG section covers everything from theory to applied machine learning in NLP, vision, and RL. I bookmark it and check often so I don’t miss papers that could inspire new ideas or projects. It’s a great way to stay ahead and learn about new methods before they hit articles or GitHub.
# 2. GitHub Trending Python Repos
This page shows the most popular Python projects each week, from new libraries to experimental tools. I keep it bookmarked because data science isn’t just about algorithms, it’s also about tools. Scanning what’s trending helps me spot useful libraries or patterns early, before they get too crowded. Just 10 minutes a week here usually gives me one or two things worth trying.
# 3. Data Is Plural
Data Is Plural is a newsletter and archive full of unusual and interesting datasets. I keep it bookmarked because it’s great for finding project ideas, tutorials, or hackathon challenges. Each dataset has a short description and a link. It’s an easy way to explore new data and get ideas beyond Kaggle or the usual sources.
# 4. The Rundown AI
The Rundown AI aggregates the top AI and machine learning news and papers, saving me hours of searching. Whether it’s a new paper, a tool release, or an emerging approach, it gives a quick overview so I can see what’s relevant. Basically, a simple way to stay informed and keep up with trends.
# 5. RAWGraphs
RAWGraphs is a free, browser-based tool for making clean, customizable charts fast. I can create visualizations straight from CSV or JSON without writing complicated matplotlib or seaborn code. It’s great for spotting trends, outliers, or making charts for reports. The charts export easily in vector formats, so they look professional in slides or articles.
# 6. Quartz Bad Data Guide
The Quartz Bad Data Guide is one of my go-tos whenever I’m cleaning messy data. It goes over common problems like missing values, garbled text, inconsistent formatting, and misentered numbers, and gives tips on how to fix them. Messy data is just part of the job, and this guide saves me a lot of time troubleshooting. I also like how it’s structured by who should fix what, which makes tracking and solving issues a lot easier.
# 7. Five Minute Stats
Five Minute Stats is a quick reference for essential statistics concepts and formulas. I can easily refresh topics like hypothesis testing, probability distributions, correlations, and descriptive stats in just a few minutes. It’s perfect when checking calculations, prepping lessons, or writing tutorials without digging through textbooks.
# 8. Awesome Data Analysis
Awesome Data Analysis is a GitHub collection of tools and resources for all parts of the data workflow. I keep it bookmarked because it’s great for cleaning, manipulating, visualizing data, and building machine learning pipelines. If I’m trying new libraries, refreshing my toolkit, or sharing with colleagues or students, it helps me quickly find reliable, well-maintained tools.
# 9. Mockaroo
Mockaroo is a tool for generating random data and mock APIs. I can quickly create realistic datasets in CSV, JSON, SQL, or Excel without typing everything by hand. It’s great for testing code, dashboards, or machine learning workflows, including tricky edge cases. Mock APIs also let me work on frontend and backend at the same time.
# 10. Foorilla
Foorilla is a platform for tech and data job listings. I use it to browse new openings, follow companies, and filter jobs by topic, location, or remote options. You can also export lists in CSV or JSON, which makes it easier to keep track of opportunities. It’s a simple way to stay updated on the job market without hopping between multiple sites.
Kanwal Mehreen is a machine learning engineer and a technical writer with a profound passion for data science and the intersection of AI with medicine. She co-authored the ebook “Maximizing Productivity with ChatGPT”. As a Google Generation Scholar 2022 for APAC, she champions diversity and academic excellence. She’s also recognized as a Teradata Diversity in Tech Scholar, Mitacs Globalink Research Scholar, and Harvard WeCode Scholar. Kanwal is an ardent advocate for change, having founded FEMCodes to empower women in STEM fields.