manuelsolan-o - Topological Data Analysis for Time Series

Introduction

In today’s data-driven world, time series analysis has become a cornerstone of many scientific and business applications. From forecasting stock prices and weather patterns to monitoring heart rates and industrial processes, the ability to understand and predict time-dependent data is invaluable. Traditional methods of time series analysis, while powerful, often struggle with the complexity and high dimensionality of modern datasets. This is where Topological Data Analysis (TDA) comes into play.

TDA is a rapidly growing field that leverages topological features to analyze complex data structures. While topology is traditionally a branch of pure mathematics, TDA has found significant applications in data science, particularly for time series analysis. This blog aims to introduce the fundamental concepts of TDA and illustrate how these concepts can be applied to understand and analyze time series data effectively.

Basic Concepts

In order to understand this blog we should define some basic concepts:

Topology: Topology is a field in mathematics that studies the properties of space that are preserved under continuous transformations. Key topological features include connected components, tunnels, and voids.
Topological Data Analysis (TDA): TDA refers to methods that utilize topological features of data to extract meaningful insights. These methods are especially useful for analyzing the shape and structure of data, providing a unique perspective beyond traditional statistical methods.
Persistent Homology: Persistent homology is a technique in TDA used to measure the topological features of a data set across different spatial resolutions. It tracks the birth and death of topological features (e.g., connected components and loops) as one varies a scale parameter, summarizing this information in a persistence diagram.
Simplicial Complexes: A simplicial complex is a set of simplices (points, line segments, triangles, etc.) that generalize the notion of a graph to higher dimensions. In TDA, data is often represented as a simplicial complex to study its topological properties.
Persistence Diagram: A persistence diagram is a summary statistic used in TDA to represent the birth and death of topological features across different scales. Each point in the diagram corresponds to a topological feature, with its coordinates representing the scale at which the feature appears and disappears.
Takens’s Embedding Theorem: This theorem provides a method to transform a time series into a higher-dimensional space, ensuring the preservation of its topological properties. This transformation is essential for applying TDA to time series data.
Point Cloud: A point cloud is a set of data points in a metric space. In TDA, a time series can be transformed into a point cloud using Takens’s embedding theorem, allowing the application of topological methods to analyze its structure.

Summary of the Article “Topological Data Analysis (TDA) for Time Series” ¹

Authors: Nalini Ravishanker and Renjie Chen, Department of Statistics, University of Connecticut

Abstract: The paper explores the application of Topological Data Analysis (TDA) to time series data. TDA, initially a topic in pure mathematics, has grown to include methods that analyze topological features like connected components, tunnels, and voids in data. The paper reviews these methods and provides examples using R functions. TDA-derived features are shown to be useful for classifying, clustering, and detecting patterns in time series data.

Introduction: Topological Data Analysis (TDA) is an emerging field that applies algebraic topology to analyze complex data. TDA focuses on understanding the shape of data by examining its topological features. Computational topology involves measuring and representing these features using low-dimensional representations. Persistent homology, a key method in TDA, captures topological features across multiple scales. The article discusses how these techniques can be applied to time series data, which typically lack natural point cloud representations.

Persistent Homology and Point Clouds: Persistent homology measures the topological features of shapes and functions by converting data into simplicial complexes. These complexes are used to describe the topological structure of a space at different resolutions. More persistent features are likely to represent true underlying structures rather than noise. The persistence diagram, a popular summary statistic in TDA, records the birth and death of these features.

TDA for Time Series via Point Clouds: Time series data are transformed into point clouds using Takens’s embedding theorem, which preserves topological properties. The choice of delay parameter (τ) and embedding dimension (d) is crucial for accurate representation. The paper illustrates the process of generating point clouds from time series and constructing persistence diagrams.

Persistent Homology Based on Functions: TDA can also be applied to continuous functions. This involves discretizing the function into grids and using sublevel set filtration to construct persistence diagrams. The distance-to-measure (DTM) function is a robust method for dealing with noisy data, smoothing the distance function to reveal significant topological features.

TDA of Time Series via Frequency Domain Functions: The paper examines the use of frequency domain representations, such as second-order spectra and Walsh-Fourier transforms, to analyze time series. These methods involve constructing persistence diagrams from smoothed periodograms or categorical time series data.

Feature Construction Using TDA: Persistence landscapes, introduced by Bubenik, provide statistical summaries of persistence diagrams that are useful for machine learning. These landscapes allow for statistical inference and maintain the stability of topological features. The paper describes the steps to construct persistence landscapes and how they can be used to compare and analyze time series data.

Conclusion: TDA offers powerful tools for analyzing time series data by focusing on their topological structure. The methods discussed in the paper, including persistent homology and persistence landscapes, provide a new perspective for understanding complex data. The paper highlights the practical applications of these techniques in various domains and suggests further exploration of TDA for time series analysis.

Practical example: Bitcoin’s Price Prediction

Descriptive Analysis

Time Series Smooth with Holt-Winters exponential Filter

Arima Model

Reflecting on the analysis, it is evident that while the ARIMA model can be used to make predictions on this time series data, there is a significant amount of error. This highlights the challenges of modeling and forecasting in time series data, particularly when the data exhibit complex behaviors or trends that are not easily captured by traditional models.

Despite the errors, classical methods like ARIMA still provide valuable insights into the time series. They allow us to decompose the data into its constituent components (trend, seasonality, and noise), which can help in understanding the underlying patterns and making informed decisions.

Topological Data Analysis (TDA)

library(TDAstats)
library(tidyverse)

── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
✔ dplyr     1.1.4     ✔ readr     2.1.5
✔ forcats   1.0.0     ✔ stringr   1.5.1
✔ ggplot2   3.5.1     ✔ tibble    3.2.1
✔ lubridate 1.9.3     ✔ tidyr     1.3.1
✔ purrr     1.0.2     
── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
✖ dplyr::filter() masks stats::filter()
✖ dplyr::lag()    masks stats::lag()
ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors

# Cargar los datos del archivo CSV
bitcoin_data <- read.csv("media/Precio_Bitcoin.csv")

# Convertir la columna de precios a numérica
bitcoin_data$Precio <- as.numeric(gsub(",", "", bitcoin_data$Precio))

# Verificar los primeros datos
head(bitcoin_data)

         Fecha   Precio
1  Aug 7, 2021 44555.80
2  Aug 8, 2021 43798.12
3  Aug 9, 2021 46365.40
4 Aug 10, 2021 45585.03
5 Aug 11, 2021 45593.64
6 Aug 12, 2021 44428.29

# Definir el tamaño de la ventana
window_size <- 5

# Crear una lista para almacenar los puntos
points_list <- list()

# Crear los puntos usando una ventana deslizante
for (i in 1:(nrow(bitcoin_data) - window_size + 1)) {
  points_list[[i]] <- bitcoin_data$Precio[i:(i + window_size - 1)]
}

# Convertir la lista de puntos en una matriz
points_matrix <- do.call(rbind, points_list)

# Normalizar los datos de la matriz de puntos
points_matrix <- scale(points_matrix)

# Realizar el análisis topológico de datos
pd <- calculate_homology(points_matrix, dim = 1)

head(pd)

     dimension birth       death
[1,]         0     0 0.006964807
[2,]         0     0 0.007599224
[3,]         0     0 0.008485967
[4,]         0     0 0.008755101
[5,]         0     0 0.009313970
[6,]         0     0 0.010296019

library(ggplot2)

# Create data frame
pd_df <- as.data.frame(pd)

# Create Persistence Diagram
ggplot(pd_df, aes(x = birth, y = death)) +
  geom_point() +
  geom_abline(slope = 1, intercept = 0, linetype = "dashed") +
  labs(title = "Persistence Diagram",
       x = "Born",
       y = "Death")

In this persistence diagram:

Each point represents a topological feature. The x-axis shows the birth value. The y-axis shows the death value. The dashed diagonal line represents features that appear and disappear instantly (less persistent). Features farther from the diagonal line are more persistent and generally more significant.

This analysis allows you to understand the underlying topological structure in your Bitcoin price data, identifying patterns and features that persist across different scales.

This data show linear behavior and in the context of topological data analysis (TDA) and specifically in a persistence diagram, this could be interpreted in several ways depending on the type of data and the context of the analysis. Here are some possible interpretations:

Short-Lived Connected Components:

If most points in the persistence diagram are near the diagonal line, this suggests that the topological features (connected components) appear and disappear quickly. This indicates that there are no significant persistent features in the data, which may signal noise or a simple underlying structure.

Lack of Complex Topological Structure:

Linear behavior may suggest that the data lack a complex topological structure. In other words, there are no persistent cycles or holes in the data that remain across different scales. This could be the case for data following a simple linear trend.

If the data themselves have linear behavior, this might be reflected in a persistence diagram where features do not persist across many scales. Linear data tend not to have many topological “holes” or cycles.

Classical Methods vs Topological Data Analysis

In conclusion, while TDA offers a unique and promising perspective, classical methods remain highly valuable for time series analysis, especially when dealing with linear trends. It is important to remain open to various approaches and integrate them as needed to achieve the best results. By combining classical methods with emerging techniques like TDA, we can leverage the strengths of each to gain a more comprehensive understanding and make more accurate predictions.

Conclusions

Reflecting on the novelty of the field, it is important to recognize that topological data analysis (TDA) is still a developing theory. As with any emerging area, it is prone to errors and misinterpretations. This is normal and expected as the field matures and becomes more refined.

Given the linear behavior observed in the time series data, classical methods currently provide more insightful and reliable information for this specific dataset. Linear data tend to lack complex topological features, such as persistent holes or cycles, which are central to TDA.

Therefore, it is crucial to remain open to different approaches and methodologies when solving problems. While TDA offers a unique perspective and has potential for uncovering complex structures in data, it should be used in conjunction with classical methods to provide a comprehensive analysis. By integrating various techniques, we can leverage the strengths of each approach and achieve more robust and accurate results.

Footnotes

Ravishanker, N., & Chen, R. (2019). Topological Data Analysis (TDA) for Time Series. Department of Statistics, University of Connecticut. Retrieved from https://arxiv.org/abs/1909.10604↩︎