Encounters with Logarithms in Data Science: Where They Arise

In the world of data science, one of the most frequently asked questions by aspiring enthusiasts is, “How much mathematics do I really need to know?” While the typical response often begins with statistics and extends to calculus and linear algebra, what often remains unsaid is precisely where you’ll encounter these mathematical concepts. In this discussion, we will shed light on one particular mathematical concept: logarithms.

Data Transformation: 

When data is collected, it seldom aligns perfectly with our analytical desires. There are instances where we need to manipulate the data to enhance our ability to draw inferences, build models, and uncover deeper insights. Data transformation involves rescaling the data using mathematical functions, and its purpose can range from improving model performance to enhancing interpretability, or even addressing computational requirements. The application of logarithmic transformations can reveal hidden insights within the data, reduce skewness, and aid in modeling, particularly when dealing with nonlinear relationships.

Demystifying Logistic Regression: Bridging the Gap Between Regression and Classification

The term “logistic regression” might seem misleading, suggesting a regression task, but in reality, it is a powerful tool primarily used for classification problems. If you’ve come across it in the context of generalized linear models (GLM) and found yourself thinking, “The graph (illustrated below) doesn’t appear linear at all,” you’re not alone. However, it’s important to note that logistic regression is indeed linear, but in a transformed sense.

In the graph, the Y-axis represents probability, which must always fall within the range of 0 to 1. However, in logistic regression, the Y-axis undergoes a transformation, shifting from probability to the log(odds), which extends across the entire real number line, ranging from negative infinity to positive infinity. Consequently, the coefficients in logistic regression convey valuable information: they indicate that a unit increase in the explanatory variable corresponds to an increase in the log(odds) by the coefficient value.

Image from DataCamp

Unraveling Log Likelihood: A Crucial Concept in Data Science

The term “likelihood” is often encountered in data science, represented as L(distribution | data). While in everyday language, “probability” and “likelihood” are sometimes used interchangeably, they have distinct meanings, although they may overlap in specific cases. This discussion won’t delve into the intricacies of their differences but will explore their applications in data science.

In certain scenarios, especially in techniques like Gaussian Naive Bayes, multiple likelihoods need to be calculated and multiplied. However, this process can lead to a computational challenge known as “underflow” when dealing with extremely small values close to zero. To overcome this issue, data scientists turn to “log likelihoods” by taking the logarithms of likelihood values. This transformation shifts values from being close to zero to becoming significantly distant from zero, effectively mitigating the underflow problem.

Cost Function: 

In the realm of data science, the term “cost function” refers to what we aim to optimize when fitting a model. Some of these functions, such as “log loss,” incorporate logarithms as integral components. So, if you encounter logarithms in cost functions, don’t be surprised!

These are just a couple of the prominent areas where logarithms play a crucial role in data science. It’s highly likely that you’ll encounter them in other contexts as well.

I hope you found this information enjoyable and insightful!

 

The post Encounters with Logarithms in Data Science: Where They Arise appeared first on Datafloq.

Leave a Reply

Your email address will not be published. Required fields are marked *

Subscribe to our Newsletter