Introduction to GIS Data: Choosing, Understanding, and Using It Responsibly

Choosing the Right Layers in GIS

When you’re first exploring a data portal, it can be tempting to download and add every layer you see. After all, more data feels like more power, right? But in GIS, adding too much information can quickly lead to cluttered maps, longer processing times, and even misleading results.

So, how do you decide which layers to include? Here are a few guiding questions to keep in mind:

What’s my research question or project goal?

Start with a clear purpose! Each layer you add should directly support answering your research question. Avoid getting distracted by extra datasets that don’t contribute to your analysis. If a layer doesn’t help you address your question, leave it out.

Suppose your research question is: Which oblasts (i.e., administrative unit) in Ukraine have experienced the most intense battles, based on ACLED conflict 2021 – 2024 data?

What are the layers you will need for the analysis?

Is the layer relevant to the spatial scale and time frame I’m working with?

Before you decide whether a dataset is useful, you need to clarify two things about your project:

  • Spatial scale (the where): Are you studying patterns at the neighborhood, city, regional, national, or global level? The appropriate dataset will depend on how detailed your study needs to be.
  • Temporal scale (the when): Are you studying a single event, a short period (days or months), or long-term trends (years or decades)? You’ll need data that covers the correct time frame and is updated frequently enough for your purpose.

Suppose your research question is: Which departments (Admin 1) in Colombia have experienced the highest levels of internal displacement between 2008 and 2018?

What are the layers you will need for the analysis?

Here’s an example of layers that would not be relevant:

  • Global refugee datasets (focus on international flows, not internal).
  • Outdated census data from 2000 without migration updates.
  • Land cover or elevation layers (interesting for context, but not needed to answer the specific question).

What’s the data quality?

When choosing which datasets to include, it’s not just about finding any data — it’s about finding the right quality of data for your project. Poor-quality data can lead to misleading results, no matter how sophisticated your analysis is.

Here are a few key things to look at:

  • Accuracy: Does the dataset represent the real-world features correctly? If roads are misplaced on a map, even by a small margin, analyses like routing or proximity calculations could fail!
  • Resolution / Scale: How detailed is the data? (spatial resolution for raster data, or scale for vector data). As an example, a global land cover dataset at 1 km resolution may not be useful if you’re studying tree canopy in a single neighborhood! For that, you might need a resolution of a square meter or less.
  • Timeliness / Update Frequency: How recently was the data collected or updated? For example, population data from 2010 won’t reflect migration trends in 2023. Sometimes, though, older data may be your only option. If that’s the case, be transparent about the dataset’s time frame and acknowledge its limitations.
  • Consistency: Does the dataset align with your other layers in terms of coordinate system, units, and definitions? For example, if one dataset uses metric units and another uses imperial, your analysis could be skewed. You will need to transform one of those datasets to make sure they have the same units.
  • Completeness: Are there gaps, missing attributes, or areas not covered? For example, if health facility data is missing rural clinics, your accessibility analysis will be biased toward urban areas.

High-quality, reliable data usually beats a long list of questionable layers. It’s better to work with fewer datasets that are accurate, current, and well-documented than to overload your project with incomplete or outdated information.

Will this layer improve my analysis or just add noise?

It’s easy to think that more data = better analysis. But in GIS, adding too many layers can actually make your work harder to interpret. Every extra dataset comes with its own attributes, scale, and potential errors — which can clutter your map and distract from the real story you’re trying to tell.

Sometimes fewer, carefully chosen layers help tell a clearer story than a crowded map with too much information.

Think of this analogy – you will notice I like to think about food!

  • Relevant layers = ingredients in a recipe.
  • Irrelevant layers = random spices thrown in for no reason!
  • The result? A dish (or map) that doesn’t taste right.

Spatial and Tabular Data

In GIS, every dataset has two essential parts:

  • Spatial Data (the “where”), which describes the location and shape of features on the earth’s surface.
    • Examples:
      • A point showing the location of a school.
      • A line showing the path of a river.
      • A polygon showing the boundaries of a city.
  • Tabular Data (the “what”), which provides attributes or information about each feature.
    • Examples:
      • A school’s name, number of students, and year it was built.
      • A river’s length, water quality rating, and average flow.
      • A city’s population, mayor, and income levels.

Think of spatial data as a map of where things are and tabular data as a table of what we know about them. GIS links these two parts together:

  • Each feature on the map corresponds to a row in the attribute table.
  • Each column in the table represents a different piece of information.

Understanding Metadata in GIS

In GIS, metadata is data that describes other data, acting as an “ID card” or “instruction manual” for geographic datasets, maps, and other spatial information. It answers the who, what, when, where, and why of the data, detailing its content, quality, origin, and limitations. Proper metadata is crucial for discovering, evaluating, and using data effectively, ensuring its transparency and suitability for specific applications.

Please check the following 1 minute video:

When deciding whether to use a data layer, the metadata is your best friend. Metadata is simply “data about the data”—it tells you the who, what, when, where, why, and how behind a dataset.

A couple of links that you can explore as well are below:

Here are some key pieces of metadata to look for:

  1. Title and Description
    • What does this dataset represent?
    • Is it clearly documented, or is it vague?
  2. Source/Creator
    • Who produced the data? (e.g., a government agency, research group, nonprofit, private company)
    • Do you trust that source?
  3. Date of Creation/Update
    • How current is the data?
    • Does the timeframe align with your project needs?
  4. Geographic Extent
    • What area does the dataset cover?
    • Does it match your study region?
  5. Spatial Resolution / Scale
    • At what level of detail was the data collected?
    • Is it appropriate for the scale of your analysis?
  6. Coordinate System / Projection
    • What spatial reference is the dataset in?
    • Will it align correctly with your other layers?
  7. Limitations / Use Constraints
    • Are there restrictions on how you can use the data?
    • Are there known accuracy issues you should keep in mind?

Methods of Data Gathering

Spatial and tabular data used in GIS can come from many different sources. Each method of data collection has its own strengths, limitations, and appropriate applications. Understanding these methods will help you evaluate how reliable a dataset is and whether it’s suitable for your project.

  • Crowdsourcing: Data collected through voluntary contributions from the public. Can cover areas where official data is lacking. There are many examples of crowdsourcing, but here are a few to get you started:
  • Surveys: Structured or semi-structured data collection through questionnaires, interviews, or forms. Provides rich, human-centered information. Some of the examples are national censuses, household surveys, market research, etc. However, we need to take into account that they are usually expensive and time-consuming to collect; additionally, data may quickly become outdated.
  • Remote Sensing: Data captured by satellites, drones, or aircraft sensors. Offers large-scale, repeated coverage of hard-to-access areas. You can find few examples below:
  • Field Observations: Direct, on-the-ground data collected manually or with instruments. As you might imagine, this option provides highly accurate, localized information however, it is usually limited in terms of area coverage and tends to be costly and time-intensive. Some well known examples are biodiversity surveys, water quality sampling among others. I will only offer one example below that I hope you can explore:
  • Social Media Data: Information extracted from platforms like Twitter, Facebook, or TikTok. Provides real-time insights into human activity and behavior. It is important to consider that these datasets could potentially be biased toward users of certain platforms; privacy and ethics concerns. If you are interested, you can check on the following blog by ESRI ArcNews: Human Behavior on Social Media Is Big Data, and GIS Makes It Actionable.
  • IoT (Internet of Things) Devices: Data generated by smart devices, networks, and sensors. The data is usually continuous through an automated collection. If you are interested to learn more about this option, check the following blog by ESRI Industry Blogs: IoT and GIS—Creating the Nervous System for Digital Twins.
  • Big Data: Massive datasets derived from digital activities. This option allows large-scale analysis of patterns and trends. There are several examples but here are a few to get you started:
  • Artificial Intelligence (AI) and Machine Learning (ML): Data generated or refined through predictive models and algorithms. This method can reveal patterns not easily visible in raw data. Some examples are the predictive health models, land cover classification, traffic forecasts. Below are a couple of examples:
  • Simulations / Models: Data produced through computational models. They are useful when direct measurement is impossible. Some examples are some of the climate change projections models, hydrological models, economic forecasts, etc. Below you I am providing a couple of examples:

Data Ethical Considerations

Working with spatial data isn’t just a technical process — it also comes with ethical responsibilities. The choices you make about what data to collect, how to analyze it, and how to share results can have real-world consequences. Below are some key areas to keep in mind:

  1. Data Privacy and Security: It is important to be aware that some location data can reveal sensitive information about individuals or communities. For example, mapping hospital visits, protest locations, or home addresses could unintentionally expose personal details. Always check whether data contains personally identifiable information (PII), and follow rules or anonymization practices before sharing.
  2. Accuracy and Misinformation: This means poor-quality data or misapplied methods can lead to false conclusions. For example, using outdated floodplain maps for city planning may put people at risk. Always verify sources, check metadata, and be transparent about limitations in your analysis.
  3. Equity and Access: If you are interested in these topics, you might be well aware that not all communities have the same ability to collect, access, or benefit from any form of data. Be mindful of who is included/excluded in datasets, and think about how your work can reduce—not reinforce—inequities. Believe it or not, wealthier cities may have detailed open data portals, while rural or marginalized communities may be underrepresented. Just check at what cities or towns in New Hammpshire have their data or services available online!
  4. Informed Consent: Data about people or communities should not be collected or shared without their knowledge or permission. When working with human data, it is crucial to seek informed consent and respect cultural protocols. For example, publishing maps of Indigenous lands without community approval can cause harm.
  5. Transparency and Accountability: Again, a similar example, publishing maps of Indigenous lands without community approval can cause harm. Document your decisions, explain your methods, and make clear where uncertainty exists in every job or research you do. An example can be when changing color scales or categories on a map to represent a particular matter in a community. This can make issues seem more or less severe.
  6. Long-Term Impact: The effects of sharing data may not be immediate, but could have lasting consequences. For example, openly sharing biodiversity maps might unintentionally aid illegal poaching when the intensions are the oposite! Always consider how your data could be used in the future — both positively and negatively.

GIS is powerful because it connects people, places, and information — but with that power comes responsibility. Ethical data practices ensure your work is not only accurate, but also respectful, fair, and socially responsible.

Feel free to check the following post: Ethics of GIS: Balancing the Benefits and Risks of Geospatial Technologies

Sources of Spatial Information

When building a GIS project, knowing where your data comes from is just as important as knowing how to use it. Not all sources are equal—some are highly reliable, while others may be outdated or incomplete.

Here are some common sources of spatial data:

  1. Government Agencies: some examples are U.S. Census Bureau, USGS, NOAA, EPA, local city or county GIS offices. These are usually free, authoritative, and well-documented.
  2. International Organizations: They usually cover global coverage, and even though they could only include coarse resolution, they offer standardized datasets. Some examples include the United Nations, World Bank, FAO, NASA.
  3. Research Institutions and Universities: Some examples include our own university, Tufts! The compilation done by the data lab includes specialized datasets created for research projects. Some things to watch for are that they may not always have long-term support or updates.
  4. Commercial Providers: Some examples are ESRI, Planet, and Google. They usually include high-resolution, often cutting-edge datasets. However, this is KEY! They usually have licensing restrictions and costs. Watch out for using these datasets in your final projects as sometimes you can only use them for visualization purposes.
  5. Open Data Portals: These include some examples like Data.gov, local government open data sites, OpenStreetMap. These portals are usually very accessible, often community-driven, free. However, it is important to check on the varying levels of accuracy and documentation.
  6. Published Papers: These include articles in journals like Nature, Science, or discipline-specific publications (e.g., Remote Sensing of Environment, Ecological Applications). These datasets are often contain cutting-edge research and unique datasets that aren’t available elsewhere. Also, they provide context, methodology, and interpretation that help you understand how the data was collected. However, be aware that data is often summarized in tables, charts, or figures—not always shared as downloadable shapefiles or GIS-ready datasets. However, sometimes, supplemental material may include coordinates or gridded data, shapepefiles, but not always. Also, you may need to digitize information (e.g., georeference a map from a paper) or contact the authors for access.
  7. Your Own Data: This includes GPS field collection, surveys, drone imagery! And of course, this option should be tailored to your project’s exact needs. For this particular option, just consider that data could be time-intensive to collect, and it will be important to account for potential errors in measurement.

Some Geospatial Data and Where to Download Them

Within Data Lab @ Tufts:

Shapefile(Administrative Boundary , country masked DEM and more)

30m Digital Elevation Model

90m Elevation Model

225m, 250m, 1km Elevation Mode

1m Elevation Data

Population Density

Rainfall data

Climate Data

Land Use Land Cover Data

Geologic Data

World administrative boundary

Monitoring Data

Soil Data

Energy & Environment Data

Protected Areas and Biodiversity

Moon Elevation Model

Marine Regions and Data

Wind and Solar Energy

Data at a country level (Admin 0)

Leave a Reply

Your email address will not be published. Required fields are marked *