What are interpolations, and what do we use them for in GIS?

Interpolation is a method used to estimate values at locations where data hasn’t been directly measured. It works by using known values from surrounding points to predict the unknown ones, creating a continuous surface. This surface helps in mapping and analysis, making sense of scattered data.

In this video, you’ll get a brief but clear explanation of interpolation and its purpose. It also gives an overview of some common techniques and highlights the role of GIS in the process.

Interpolation is based on a simple yet powerful concept: things that are close together tend to be more similar than those farther apart. This spatial correlation allows us to estimate values in areas where direct measurements are missing. In GIS, interpolation predicts values using a number of sample points, helping to create continuous surfaces from point data or contours—whether it’s elevation, rainfall, chemical concentrations, noise levels, or other spatial variables.

While the example above shows input points aligning with cell centers, real-world data is rarely this precise. One challenge of interpolation is that it can degrade the original data to some extent, meaning that even if a data point falls within a cell, the final raster value may not be an exact match. Despite this, interpolation remains one of the most effective tools for filling in gaps and making sense of spatial data.

Data collection methods

The reliability of interpolation depends on the quality and distribution of the input data, making sampling strategies a crucial part of the process. Since collecting data everywhere is often impractical, if not impossible, we rely on sampling—a smart shortcut that allows us to analyze an entire population by examining just a subset of it. By gathering data from a select number of points, we can make informed predictions about the bigger picture while controlling both the number of samples and the pattern in which we collect them.

In spatial analysis, different sampling strategies help ensure accuracy and efficiency:

Systematic Sampling

Systematic sampling involves collecting data points at regular, evenly spaced intervals, typically forming a grid or parallel lines. This method is straightforward, ensuring uniform coverage and making data collection structured and predictable by spacing samples at fixed X, Y intervals. One of its main advantages is its simplicity—it’s easy to understand and apply, reducing the likelihood of human error. However, systematic sampling also has its drawbacks. Since every location receives the same level of attention, it may not account for natural variations in the data. Additionally, staying precisely on the predefined sampling lines can be challenging, and there is a risk of introducing biases if the spacing unintentionally aligns with an underlying pattern in the landscape.

Random Sampling

Random sampling involves selecting locations using a random number process to minimize bias. Once chosen, these points are plotted on a map and then visited for data collection. This method is useful when selecting a representative sample across a study area, ensuring that every point has an equal chance of being chosen—assuming no specific area is inherently more important than another for analysis. It is particularly effective in homogeneous areas where avoiding bias from manual selection is crucial. However, random sampling has some limitations. Since points are selected randomly, it does not guarantee an even distribution of samples, which can lead to underrepresentation in areas with high variability. Additionally, the randomness of point selection can make results harder to explain or justify, and some chosen locations may be impractical for field visits. Therefore, understanding both the study phenomenon and the terrain is essential to ensure that random sampling is the most appropriate and justifiable method for data collection.

Cluster Sampling

Cluster sampling in GIS is ideal for studying large, geographically dispersed populations where sampling every individual is impractical or costly. It is especially useful when the population naturally forms clusters, such as neighborhoods, schools, or districts, allowing data collection to be concentrated in specific areas to save time and resources. This method involves establishing cluster centers—either randomly or systematically—with samples arranged around each center. The selected points are then plotted on a map and visited for data collection. A well-known example is the U.S. Forest Service’s Forest Inventory Analysis (FIA), where clusters are placed randomly, followed by a systematic sampling pattern within each location. One of the key advantages of cluster sampling is its efficiency; by reducing travel time, it makes data collection more practical and cost-effective.

Adaptive Sampling

Adaptive sampling in GIS is useful when studying areas with high variability or uncertainty, allowing data collection to be concentrated where it provides the most value. This method is particularly effective for analyzing complex spatial patterns that may not be fully understood in advance, optimizing efficiency while minimizing the number of samples required. The sampling density adjusts based on observed conditions, increasing in areas with high variability to improve precision while reducing effort in more uniform regions. However, adaptive sampling requires prior knowledge of spatial variability, often through a two-stage sampling approach. Its main advantage is efficiency—homogeneous areas require fewer samples, while variable areas receive better representation. The primary drawback is the need for existing information on how variability is distributed across space.

Common Interpolation Techniques

Interpolation methods vary widely, but they all serve the same fundamental purpose: using known sample points to estimate values at unmeasured locations. Each method differs in how it weighs sample points and how many are considered in predictions, leading to varied outcomes even when applied to the same dataset. Because no single approach works best in every situation, selecting the right interpolation technique depends on the characteristics of the data and the goals of the analysis.

Global Vs Local

In spatial interpolation, a “global” technique uses all available data points across the entire study area to calculate an estimated value at a given location, while a “local” technique only considers a smaller subset of nearby data points to make the estimation, focusing on a localized neighborhood around the point of interest (Climate Data Tools User Guide).

When you need a smooth, continuous surface across an entire study area and the data is relatively consistent, global interpolation techniques are the preferred choice. These methods consider all available data points to create broad, generalized predictions. On the other hand, when capturing localized variations is essential—especially in areas with significant spatial patterns or high variability—local interpolation techniques are more effective, as they focus on smaller subsets of data to reflect finer details in spatial distribution.

Deterministic Vs Stochastic

Interpolation methods can be broadly categorized into two main types: deterministic and stochastic. Deterministic interpolation relies on predefined spatial relationships, such as the degree of similarity or smoothing, to estimate values. In contrast, stochastic interpolation incorporates random functions and considers spatial dependence between points, quantifying spatial autocorrelation and accounting for the spatial configuration of sample points around the prediction location.

A key distinction between the two is that deterministic methods do not provide an assessment of prediction errors, whereas stochastic methods offer error estimates through variance calculations, enhancing reliability.

Just as interpolation methods can be categorized as global or local based on the area they cover, they can also be classified as exact or inexact depending on how they treat known data points. An ‘exact’ interpolation method ensures that the predicted value at a sampled location matches the measured value, meaning the surface passes directly through all known points. In contrast, an ‘inexact’ method allows for slight deviations, smoothing out variations to create a more generalized surface. Essentially, exact interpolation preserves the original data points, while inexact interpolation prioritizes smoothness and can introduce slight modifications to reduce noise.

Exact Interpolation

  • The interpolated value at a known data point is always the same as the measured value.
  • Used when maintaining data accuracy at sample points is essential.
  • Examples: Inverse Distance Weighting (IDW) with a very high power factor, Radial Basis Functions (RBFs).

Inexact Interpolation

  • The interpolated value at a known data point may differ from the measured value.
  • Commonly applied when dealing with noisy data or when a smoother surface is preferred.
  • Examples: Polynomial Regression, Kriging with a nugget effect.

Spatial Interpolation Tools in ArcGIS Pro

Thiessen Polygons

Thiessen polygons, also known as Voronoi polygons, are a simple yet powerful method for defining areas of influence around a set of sample points. Used in various fields such as geography, agriculture, and business, this technique helps determine which point in a dataset is closest to any given location.

The concept behind Thiessen polygons is straightforward: each unknown location is assigned the value of its nearest sample point—without averaging, weighting, or considering multiple points. Essentially, the closest data point determines the value for the entire region around it.

However, while Thiessen polygons offer a quick way to delineate areas of influence, their limitations make them unsuitable for capturing gradual spatial variations, but it remains as a valuable tool for certain applications where proximity is the primary concern.

For more information about this technique, visit: Create Thiessen Polygons (Analysis) (ESRI, ArcGIS Pro 3.4).

Inverse Distance Weighted (IDW)

This interpolation technique is a spatial interpolation method that estimates values at unknown locations based on nearby known data points. It is commonly used for predicting variables such as elevation, noise levels, or consumer purchasing power.

IDW operates on the principle that values at closer points are more similar than those farther away. To estimate an unknown value, the method assigns weights to known data points based on their distance from the target location, with closer points exerting a stronger influence. These weights are then used to calculate a weighted average, meaning nearby points contribute more to the final estimate than distant ones. This approach ensures that the estimated value reflects the spatial distribution of the surrounding data while maintaining a smooth transition between points.

For more information about this technique, visit: How inverse distance weighted interpolation works (ESRI, ArcGIS Pro 3.4).

Kriging Interpolation

Kriging interpolation is a geostatistical method used to estimate values at unmeasured locations based on known values from nearby points. For example, this method can help predict things like average monthly ozone levels across a city or the availability of healthy food in different neighborhoods. Unlike simpler methods such as IDW, Kriging takes into account how data points are related to each other in space rather than assuming a specific pattern of distribution. It analyzes the spatial arrangement of the data to make more accurate predictions. Another advantage of Kriging is that it also provides an estimate of how uncertain each prediction is, helping to understand the reliability of the results.

This interpolation method works by incorporating spatial autocorrelation, which means it considers the statistical relationships between observed data points to improve accuracy. It assigns weights using a linear model, factoring in both the distance and orientation of points. As a result, closer and more similar points have a stronger influence on the predicted values. By accounting for these spatial relationships, Kriging provides a more precise and statistically reliable approach to interpolation.

Natural Neighbor

Natural neighbor interpolation estimates values by finding the closest input points to a given location and assigning weights based on their surrounding areas. Also known as Sibson or ‘area-stealing’ interpolation, this method is local, meaning it only considers nearby points for each estimate. One key advantage is that it ensures the predicted values stay within the range of the original data, avoiding the creation of artificial peaks, pits, ridges, or valleys. The surface smoothly connects the input points but remains slightly less smooth exactly at those locations.

In cases like TIN-to-raster interpolation, breaklines can be used to introduce sharp transitions where needed, such as along roads or water bodies. Natural neighbor interpolation adapts automatically to the structure of the data without requiring users to define parameters like search radius, sample count, or shape. It works well for both regularly and irregularly spaced data, making it a flexible and reliable method for spatial analysis.

For more information about this technique, visit: Natural Neighbor (Spatial Analyst)

Further Reading

Please find below additional readings that will be helpful:

  • Chang, K. T. (2008). Introduction to geographic information systems (Vol. 4). Boston: McGraw-Hill.
  • DeMers, M. N. (2008). Fundamentals of geographic information systems. John Wiley & Sons.
  • Mitas, L., & Mitasova, H. (1999). Spatial interpolation. Geographical information systems: principles, techniques, management and applications, 1(2), 481-492.
  • Lam, N. S. N. (1983). Spatial interpolation methods: a review. The American Cartographer, 10(2), 129-150.

Here are some publications on interpolation applications that I have found useful:

  • Vasudevan, V., Gundabattini, E., & Gnanaraj, S. D. (2024). Geographical Information System (GIS)-Based Solar Photovoltaic Farm Site Suitability Using Multi-criteria Approach (MCA) in Southern Tamilnadu, India. Journal of The Institution of Engineers (India): Series C, 105(1), 81-99.
  • Earls, J., & Dixon, B. (2007, June). Spatial interpolation of rainfall data using ArcGIS: A comparative study. In Proceedings of the 27th Annual ESRI International User Conference (Vol. 31, pp. 1-9).
  • Zhang, Z., Zhou, Y., & Huang, X. (2020). Applicability of GIS-based spatial interpolation and simulation for estimating the soil organic carbon storage in karst regions. Global Ecology and Conservation, 21, e00849.

Data Cleaning and Exploratory Process

Tableau Software

Tableau is a powerful data visualization and business intelligence software that helps individuals and organizations transform raw data into insights. Known for its intuitive drag-and-drop interface, Tableau enables users to create dynamic dashboards, interactive visualizations, and detailed reports without requiring extensive technical expertise. Tableau supports data integration from multiple sources, including spreadsheets, databases, and cloud services, making it versatile for various industries and use cases.

The Tableau Data Builder, part of Tableau’s ecosystem, is a tool designed to streamline the process of preparing and shaping data for analysis. It allows users to combine, clean, and organize datasets with ease, ensuring data quality and readiness before visualization. With features like automated data modeling, real-time collaboration, and support for complex transformations, Tableau Data Builder empowers users to focus more on analyzing and interpreting data rather than spending time on preprocessing tasks.

In this tutorial, you will explore both Tableau software and Tableau Data Builder. Let’s consider the case of a data analyst working for a city’s police department. The department has recently decided to modernize its crime database to enhance the accuracy of its crime mapping system. The analyst is tasked with cleaning and restructuring the existing crime data to make it more reliable and actionable for visualization and decision-making.

Preparing the Dataset for Analysis

In this tutorial, we will work with a dataset containing crime records for Mexico City. This dataset includes information such as the type of crime, the date and time it occurred, and spatial data like latitude and longitude. However, like many real-world datasets, it contains inconsistencies and missing values that need to be addressed before analysis. Using Tableau Data Builder, we will clean and prepare the dataset while ensuring the original file remains untouched. This step is critical for maintaining data integrity and ensuring accurate and actionable insights during visualization.

To prepare the dataset for analysis and meet the specific criteria required for visualization, we will address the following:

  • Eliminate records without spatial location using Tableau Data Builder. Records without spatial information (latitude and longitude) lack the ability to be mapped effectively.
  • Define a relevant time frame for the analysis and exclude records outside this range. This ensures the data is consistent and focused on the period of interest.
  • Filter out entries that do not include a timestamp or date of when the crime occurred. This ensures the dataset can support time-based visualizations and analysis.
  • Identify and correct erroneous values in the longitude field to maintain the integrity of spatial data. Tableau Data Builder allows for easy detection and rectification of such anomalies.
  • Extract all records related to robbery in any of its forms. This subset can then be used for specific visualizations or in-depth analysis focusing on robbery trends in Mexico City.

Once these steps are completed, the prepared dataset will be ready for use in Tableau, where you can create visualizations to uncover trends and insights in crime data. By using Tableau Data Builder for this process, you ensure the original dataset remains untouched while maintaining a streamlined and clean data preparation workflow.

The following steps will be followed:

  • Connecting the file in Tableau – exploring the dataset!
  • Create a Clean Step
    • Handle Null values and correct false “positive values” in coordinates
    • Verify the correct dates
  • Add an Output step and name the file and the location, and the output format
  • Run the model

1. Connecting the File in Tableau

Launch Tableau, go to the “Connect” pane, and select the appropriate file type (e.g., CSV, Excel). Browse to the file location, open it, and preview the data to ensure it loads correctly. Familiarize yourself with the dataset structure and content.

Connect to file – in this case, we are adding a text file (*.csv)

  • Next, search the file \Data\Data_CdMx_Crime
  • Open the file named ‘victimas_completa_octubre_2021.csv’
  • Confirm that the input data parameters (e.g., file format, column names, and data types) match the expected structure of your dataset.
  • Double-check that you are using a copy of the original file to preserve the integrity of the raw dataset.

2. Clean Step

After connecting to the dataset, add a “Clean Step” to start preparing the data. This will allow you to apply transformations without altering the original file.

Review the summary graphs provided in Tableau Prep Builder for each attribute. These visualizations display the distribution of the data and help identify patterns, outliers, or inconsistencies. Also, look for any extreme values, unusual patterns, or inconsistencies in the dataset. Pay particular attention to numeric fields, spatial data, and date ranges.

Next, we will create filters to remove outliers, change the data type, null values, or any records that do not meet the criteria for your analysis. For example, filter out records with missing spatial data, incorrect longitude/latitude values, or invalid date entries.

Check for Null values in coordinates in the latitude and longitude fields. Null values indicate missing spatial data, which should be filtered out to ensure accuracy in mapping.

Next, we will identify and correct the ‘false positive’ longitude values. In Mexico City, longitude values should always be negative. However, some entries might have false positive values (e.g., incorrectly recorded as positive). To fix this:

  • Add a calculated field by clicking on the dropdown arrow next to the longitude field and selecting “Create Calculated Field.”
  • Name the calculated field “Longitud_Fix” for clarity.
  • Enter the following formula into the calculated field editor: (ABS([longitud])) * -1
  • This formula takes the absolute value of the longitude (ensuring it is positive) and multiplies it by -1 to make it negative, as it should be for Mexico City.
  • If you no longer need the original longitude field, replace it with “Longitud_Fix” in your dataset to standardize your data.

These steps ensure that all longitude values are accurate and consistent with the spatial requirements for Mexico City, improving the reliability of your mapping visualizations.

Review all fields in the dataset to ensure they are complete and accurate. Pay close attention to the ‘dates’ field to verify that all entries are valid and correctly formatted. Also, Identify and delete all records with null values in the following fields:

  • Latitude
  • Longitude
  • Fecha Hecho (Date of the incident)

Once the data cleaning is complete, save your Tableau Prep flow to ensure all changes are retained for future reference. When saving the output files, make sure to store them within the designated project’s folder. This will help maintain organization and ensure all project-related files are easily accessible.

3. Creating an Output

In the Tableau Prep Builder interface, locate the Output option in the toolbar or right-click on the last step in your flow and select Add Output.

In the output configuration panel, you will need to specify the following:

  • File Name: Enter a descriptive name for the output file (e.g., “Cleaned_Crime_Data_Oct_2021”).
  • File Format: Choose the desired output format (e.g., CSV, Excel, or Tableau Hyper Extract). Select the format based on how you intend to use the dataset.
  • File Location: Browse to your project’s folder and set it as the destination for saving the output file. Ensuring the file is saved in the correct location helps maintain project organization.
  • Run the model!

Spatial Octopus

Dive into the fascinating world of geospatial technology with Spatial Octopus—your go-to hub for tutorials, news, blog posts, and contributions from a global community of enthusiasts and experts. Whether you’re a seasoned professional or just beginning your geospatial journey, we’re here to inspire, inform, and connect you with like-minded individuals.

Our mission is to make geospatial knowledge accessible and engaging while fostering a vibrant network for collaboration and innovation. With topics ranging from cutting-edge technology to practical applications, we aim to highlight how geospatial insights can transform our understanding of the world and drive meaningful change.

About Me

Spatial Octopus was founded by Marcia Moreno-Baez, a dedicated expert in geospatial technology with over 25 years of experience spanning academia, government, NGOs, and the private sector.

Her work focuses on bridging social, economic, and environmental goals through applied geospatial research, fostering equity and justice by uniting diverse sectors and stakeholders. Passionate about empowering others, Marcia teaches classes that explore the transformative potential of geospatial methods in areas such as climate change, humanitarian assistance, and natural resource management across the globe.

Join the Adventure

Spatial Octopus is more than just a resource—it’s a community. Engage with insightful tutorials, stay informed with the latest geospatial news, contribute your own expertise, and enjoy the fun side of geospatial exploration. Together, we can chart new territories and create a more connected, sustainable future.

Welcome aboard the Spatial Octopus adventure!