Academy Knowledge Base
Want our latest articles and tutorials in your inbox?
Sign up for our newsletter.

Build an AI Data Assistant with Streamlit, LangChain and OpenAI
In the rapidly evolving field of data science, having a reliable assistant can significantly enhance your productivity. In this article, we'll build an AI assistant using Streamlit, Langchain and OpenAI models that will transform the way you explore and analyse data. The AI assistant app we build ca...more

Dr Ana Rojo-Echeburúa

Inverse Problems in Science and Engineering
Although inverse problems are often studied and solved by statisticians and computer scientists, inverse problems are ubiquitous and appear in all scientific and engineering fields. This article aims to demystify the essence of an inverse problem and underline their applicability to real-world appli...more

Dr Alexander Mead

senSiteUQ: A New Paradigm for Sewer Network Monitoring
Deciding how many sensors to purchase and where to put them is currently heuristically driven. The cost of sensors, physical constraints such as the ease of access and IOT connectivity, and expert opinion all factor in, but the heuristic approach does not allow for measurable detection targets and q...more

Dr Mikkel Lykkegaard

autoEPC: A Step Change for Energy Efficiency
autoEPC is a cloud-based software application, which allows assessing property EPC ratings fast, cost-effectively, and reliably, with minimal input from the user. It is built on digiLab's industry-leading machine learning platform twinLab, providing robust, data-driven predictions with Uncertainty Q...more

Dr Mikkel Lykkegaard

Python In Excel, What Impact Will It Have?
On the 22nd August, Microsoft made a major announcement. They would be implementing Python in Excel. At the time of writing, this has only been released in a Beta test environment and is only available to those on the Microsoft 365 Insiders program. The announcements do however give us a tantalisin...more

Richard Warburton

Understanding Transformers : One hot Encoders and First Order Sequence Models
This is the first in our series of blog posts where we start to break down the powerful set of tools known as Transformers. These are the underpinning technology which has catapulted Large Language Models (LLMs) from a Machine Learning research field into the public spotlight! Before we delve into t...more

Prof Tim Dodwell

Introducing Active Learning
Active Learning is a powerful statistical technique that augments the efficiency and effectiveness of machine-learning models. Recognizing that our user base includes engineers, who may not be intimately familiar with machine learning or Bayesian statistics, this post will elucidate the concept and ...more

Dr Alexander Mead

How to Clean Data in Excel Using Python: The Basics
In part three of our data cleaning series, we move beyond manual methods and dive into Python's robust features for advanced data cleaning. We'll walk through Python equivalents for familiar Excel tasks such as splitting columns, applying formulas, and removing duplicates. With a real-world example ...more

Richard Warburton

Supercharge Your Microsoft Excel Data Cleaning with Python!
In today's data-driven world, the importance of clean data cannot be overstated. This article delves into the intricacies of data cleaning in Microsoft Excel. We’ll explore the process of cleaning data and assess the options we have available to clean data. We’ll also demonstrate how to use Python t...more

Richard Warburton

What Does Clean Data Look Like in Microsoft Excel?
As data analysts, we are often tasked with providing key insights and statistics at short notice. However, the datasets we are provided with are rarely fit for purpose. Very little is written about the best practice for capturing clean data in an Excel Spreadsheet, something this article rectifies.

Richard Warburton

Large Scale Uncertainty Quantification
UM-Bridge is an open-source software framework that enables seamless integration between UQ software packages such as PyMC, emcee, MUQ, QMCPy, SGMK, and tinyDA and any containerised modelling software, no matter which language either is written in. Additionally, UM-Bridge makes use of the latest tec...more

Dr Mikkel Lykkegaard

Receiver Operating Characteristic Curve...the What, Why and How
I have to admit that the Receiver Operating Characteristic (ROC) curve is probably one of the least memorable names in Machine Learning! In this explainer, we will look at what the ROC curve communicates for a classification model, and then give particular attention to the area under the curve (or a...more

Prof Tim Dodwell

Supervised vs unsupervised learning...and more
Supervised vs Unsupervised Learning refers to the two main Machine Learning paradigms. There is no competition between the two: they simply use different algorithms for different tasks. As a Machine Learning engineer, your task is to identify which class of algorithm you need for a given challenge a...more

Prof Tim Dodwell

twinLab Case Study: Emulating Cosmological Structure with twinLab
In this article, we will explore how twinLab can be used to create an emulator for the statistical properties of structure in the universe. This emulator can then be used to generate mock maps of the matter distribution across vast regions of space. This reduces the need for computationally expensiv...more

Dr Alexander Mead

Rebalancing your data with the Synthetic Minority Oversampling Technique (aka SMOTE)
In classification problems, imbalanced datasets are really common. This causes particular problems when building good machine learning classifiers. If we don't rebalance the data, we see that models are particularly susceptible to giving false negative results - this is where the algorithm wrongly c...more

Prof Tim Dodwell

LangChain - A Toolbox for Supercharging Large Language Models
In the span of six months, we’ve gone from having our minds blown by the power of ChatGPT to a world where the average person, with just a sprinkling of Python, can build their own custom semi-autonomous agents. LangChain is the new library making all this possible. It packages up a whole host of to...more

Dr Seán Carroll

Ethics and Machine Learning
Machine learning and ethics are necessary bedfellows in order to ensure that new, groundbreaking algorithms can be used equitably and do not tip the scale of ethics and technology inappropriately. This blog post will break down some of these implications and provide practical, implementable advice f...more

Michelle Fabienne Bieger

What is the Curse of Dimensionality in Machine Learning?
If you have ever dipped your toe into the world of Machine Learning, then chances are that you have come across the term the 'Curse of Dimensionality'. In this blog post, we'll explore why the Curse of Dimensionality is an important concept in Machine Learning and Data Science, how it shapes our und...more

Prof Tim Dodwell

How to Build a Digital Twin of a City
We build digital twins to better make sense of complex, inter-woven, multi-headed problems. In this post we will walk through an exciting new venture we have been working on at digiLab: building a digital twin of a city for green energy planning, net-zero attainment and sustainability. We are callin...more

Dr Andy Corbett

Generative AI is Changing the World
We’re living through an explosion in the use of consumer-facing AI tools. This technology is starting to have a profound impact on how we work. Recently, we’ve used Midjourney to help generate the course images for our collection of AI in the Wild online courses. In this post, I’ll give a quick summ...more

Dr Seán Carroll

Promoting Equality, Diversity, and Inclusion in AI and ML
EDI is essential for creating fair and inclusive AI and ML solutions. Without a focus on EDI concerns, there is a risk of creating solutions that are biased, which can have negative impacts on individuals and society as a whole, and also on the organisations who are utilising the solution in their d...more

The digiLab Team

Which ML Experts should you Follow in 2023?
We're passionate about using AI/ML in industry, to solve engineering and sustainability challenges. So we thought we'd make a list which is a bit more personal and a bit less mainstream. We're excited at the moment by PINNs, Uncertainty Quantification, Gaussian Processes, and Spiking Nets: our list ...more

The digiLab Team

Learning AI/ML to make a positive impact on sustainability: 11 sectors
In the words of Billy Joel, 'We didn't start the fire'. The sustainability crisis has been brewing for decades. But now, advances in AI/ML mean we have the technology and tools to start putting the fire out - across multiple sectors

Prof Tim Dodwell

Data Cleaning: What, Why, How?
Missing data is just one of the problems that most data scientists encounter every day. And nonsense data being recognised as such is, in fact, the best-case scenario. Missing data can be encoded in many different ways (I have seen missing data labelled as -1, -999, ‘missing’, ‘went for a coffee’ an...more

Dr Mikkel Lykkegaard

Machine Learning vs Data Science Careers
Are you interested in a career in data science or machine learning? There are many similarities between these two fields, but there are also some key differences that you should be aware of. Knowing the differences between these two disciplines can help you decide which one is best for your career g...more

Prof Tim Dodwell

DaFT: DerivAtive-Free Thinning
Data thinning, core subset selection, and data compression are critical but often overlooked techniques in data science, machine learning and data-driven science and engineering. When executed correctly, they can lead to massive speed-ups of your analytics pipeline. In this article, we will unpack t...more

Dr Mikkel Lykkegaard

Get a Machine Learning Job - Tools and Skills You Need to Know
In article 4/8 of our series, 'Getting a Machine Learning Job', we examine the tools/skills which companies will want to see on your CV/resume and will want to test at interview. You don’t need to be an expert in any, let alone all of them for your first job or internship – but you must be able to d...more

Prof Tim Dodwell

Get a Machine Learning Job - 8 Interview Questions We Ask!
In this article, we take a look at the machine learning interview and those dreaded machine learning engineer questions. So if you have you're preparing for your first machine learning job interview or you're in the process of looking for that killer data science internship, read on!

Prof Tim Dodwell

Get a Machine Learning Job - How to get a data science internship
How do you land your first job in Machine Learning? Understanding the difference between supervised and unsupervised methods is one thing – but the best candidates already have a data science internship under their belt. Here are four key strategies (and some small hints) to getting that data scienc...more

Prof Tim Dodwell

Get a Machine Learning Job in 2023 - Six Top Machine Learning Books
So what is the strategy for improving your chances of landing a machine learning job? There is lots of good things you can do to move forward in your Machine Learning journey - but sometimes you just need a good book!

Prof Tim Dodwell

Neural Networks: Three Key Types
Neural Networks are often seen as mysterious “black boxes”. In this article, we want to unpack them a little bit – or at least identify the three fundamental types! We’ll explore how they function, and what sort of problems they are good at solving.

Dr Andy Corbett