Using QGIS to analyze property value, Part 1

GIS, or geographic information systems, is one of the most invaluable tools for data science and engineering, with its adoption being from environmental conservation, election statistics, urban planning, or even criminology. When you pair it with a scripting console, such as R-Project or Python, you can manipulate your data on a large scale before laying out the visuals on a map.

I can demonstrate an example of the benefits of using the popular open source QGIS in conjunction with R using a popular urban planning topic - property tax revenue.

Before we begin, I will recognize one of the most intimidating aspects of GIS for beginners is finding useful web resources. I will provide my sources as I go. We will establish a map by downloading shapefiles from Penn State’s geospatial data clearinghouse and download the following: Allegheny County - Parcels.

It’s recommended that you save everything into a single project folder. Import the shapefile into QGIS with Ctrl+Shift+V and open the files.

Figure 1 - Initial project folder.

If you get a prompt regarding transformation type when importing, just hit OK. We’re not worried about transformations. You should have a map of Pittsburgh like shown below:

Figure 2 - Initial imported map of Allegheny County.

We’re going to look at the feature names provided in the tax parcel map. Select the parcel layer in the pane, then from the dropdown menu, click View > Identify features, and select a tax parcel. The results panel that pops up on the right will give you the attributes to the property that you selected.

Figure 3 - Feature attributes to the tax parcel selected. We will use these properties as a reference because we are going to join an external source to it.

One of the problems that we’ll have to negotiate, is that, this shapefile does not have all the parameters that we need! We need the property’s market value and other critically relevant data such as exempt statuses.

Nevertheless, we can still pull these real estate parameters from an external source. With every property labeled a unique Parcel ID, in this case the property name is called PIN, this will be our “join” attribute that will tie with the external data file containing what we need.

Analysis Continued in R-Project

Luckily despite this setback, we can download not a shapefile, but a CSV containing the county assessment data that we need, and we will compute the tax billing for each property in R-Project prior to moving back to QGIS. This CSV is found on the University of Pittsburgh’s Western Pennsylvania Regional Data Center. The download page for the property records CSV file can be found here.

In the state of Pennsylvania, the county’s assessment breaks up real estate appraisal into categories of land value and building value, and the sum of these two equal the taxable market value. The landowner’s annual tax bill, from the county is simply calculated by multiplying this taxable market value by 4.73 mils, or 0.00473. There is also an $18,000 reduction for the building appraisal if the property is the homeowner’s primary residence. A full exemption applies if the property qualifies. Pretty simple! We can easily translate all of this into R code.

Also note that, in the CSV file, the MUNIDESC column lists the municipality of the property, but in the cities, it includes the Ward number. For example, a property in a township may be listed as “Brentwood” but one in a city may be listed as “10th Ward - Pittsburgh”. I created a separate column called “Municipality” that strips out the Ward number using Regex replacement.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
library(dplyr);
library(rtern);
library(stringr);
setwd("~/qgis-projects/propertytaxmap");

# Due to the massive number of properties in the county, we
# will test the code only against 2000 properties until we're
# confident that the code is OK.
assessment.csv = read.csv('assessments.csv') %>% tail(2000);
taxrate.county = 0.00473;

assessment = assessment.csv %>%
mutate(countyTax = TAXCODE == 'E' ? 0 : round(COUNTYTOTAL*taxrate.county,2)) %>%
mutate(municipality = str_replace(MUNIDESC,
# Regex to remove the "nth Ward - " from the municipality name
"^(?:(?:\\d+)(?:st|nd|rd|th)(?:\\s+)Ward(?:\\s+)-(?:\\s+))?((?:\\S|(?:\\s(?=\\S)))+)(\\s+)?$",
"\\1"));

We can also verify our math by checking the county’s assessment website, and looking at a homeowner’s tax bill since that info is public record, and it looks like our calculations are spot on. In this case, the appraisals on the CSV file already included this deduction, so the math is easy. However, we will have to reverse this deduction when computing borough tax billing, because municipalities don’t use it.

While we will be calculating taxes on the municipal level, we will exclude school district tax. My ultimate purpose is to examine the differences of the infrastructure’s condition, between a neighborhood with a solid cash flow as opposed to a struggling one. The tax rates for townships are listed as millages. Allegheny County lists the millages for all townships here. I compiled the 2021 rates into a CSV file which you can download here. We will calculate the municipal property tax in the “mutate()” command below, and simplify the assessment table by only selecting the rows that are of our interest.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
assessment.csv = read.csv('assessments.csv');
taxrate.municipal = read.csv('municipal-tax.csv');
taxrate.county = 0.00473;

assessment = assessment.csv %>%
mutate(countyTax = TAXCODE == 'E' ? 0 : round(COUNTYTOTAL*taxrate.county,2)) %>%
mutate(municipality = str_replace(MUNIDESC,
# Regex to remove the "nth Ward - " from the municipality name
"^(?:(?:\\d+)(?:st|nd|rd|th)(?:\\s+)Ward(?:\\s+)-(?:\\s+))?((?:\\S|(?:\\s(?=\\S)))+)(\\s+)?$",
"\\1")) %>%
rowwise() %>%
mutate(municipalTax = (function() {
if (TAXCODE == 'E') { return(0); } # tax exempt
thisTaxRate = taxrate.municipal %>%
filter(str_to_lower(MUNI) == str_to_lower(municipality));

# Tax rate is in mills, so multiply the rate by 0.001
muniBuildingTax = LOCALBUILDING * thisTaxRate$BUILDING * 0.001;
muniLandTax = LOCALLAND * thisTaxRate$LAND * 0.001;

(muniBuildingTax + muniLandTax) %>%
round(2) %>%
return()
})()) %>%
ungroup() %>%
mutate(totalTax = countyTax+municipalTax) %>% # all taxes other than school taxes being paid
mutate(taxPerAcre = totalTax/(LOTAREA/43560)) %>% # 1 acre = 43560 sq. ft
select(PARID, MUNIDESC, COUNTYTOTAL, LOCALTOTAL, municipalTax,
totalTax, taxPerAcre, municipality);

We can then get the cumulative revenue per municipality by using dplyr’s summarize function. Note that, once you run the whole script for all properties, it may take a significant amount of time to calculate the revenue for all of the properties in the county.

1
2
3
summary.municipal = assessment %>% 
group_by(municipality) %>%
summarize(Properties = n(), `Muni tax rev` = sum(municipalTax));

We can compare our calculations with those in the financial statements of 2021, and locate the reported Real Estate tax revenue to see how well our numbers compare!

Figure 4 - Sum of all municipal tax revenue per municipality. No county taxes or school taxes included.

As is, I’m pretty satisfied by the comparison, so the next step will be to export the file back into a CSV. We will use the following one-liner:

1
write.csv(assessment, "propertytaxlist.csv");

Join the data into the map

Now that we have the data that we can easily tie back into the map, we will return to QGIS for analysis. The first thing that we’ll need to do is to include a base map so that we have a quick reference on the location of everything. So we would install the plugin for that. Go to Plugins > Manage and Install Plugins.., and in the search box, search for QuickMapServices and install it. This retrieval tool is one of the most commonly used tools for QGIS, and we will be relying on it.

Figure 5 - Plugin installation of QuickMapServices.

It is recommended that we enable the experimental maps available with this tool, so in order to do that, we will go to Web > QuickMapServices > Settings. Go to the “More Services” tab and click on the button “Get contributed pack”. Click okay on the warning prompt, and you should shortly get an alert saying that the installation was complete.

Figure 6 - Installation of more services

Go to Web > QuickMapServices, and take your pick in the options to choose a base map, but I will use OSM Standard map, as shown below:

Figure 7 - Project with OSM Standard Map

Now here comes the fun part. We will use the common columns in our R table and join them with the parcel shapefile. In order to do that, we will import our newly-created CSV “propertytaxlist.csv”, from the menu Layer > Add Layer > Add Delimited Text Layer..; then select the propertytaxlist.csv file and click “Add”. The CSV should then be listed in the layers panel as shown below. Close the window.

Figure 8 - The propertytaxlist CSV file will appear in the Layers panel despite nothing having changed on the map itself.

Then, we will select our parcels layer, right click > Properties…, then on the Properties window, go to the “Joins” tab, and click on the plus sign for creating a new join as marked on Figure 9.

Figure 9 - Join a shapefile's data with another object in this window.

The New Join button will give you a prompt asking for the “Join layer”, the “Join field”, and the “Target field”. We have only one eligible layer for joining, which is the CSV file that we imported, so that should be the “propertytaxlist” layer. Since we are joining by the unique identifier of the parcel, we would select the “PARID” field from the Join layer, and as we established in the beginning of this tutorial, the “PIN” is the corresponding field on the parcel shapefile. Figure 10 shows the prompt filled out below.

Figure 10 - How you would want to join the CSV attributes into the parcel layer is shown here.

Once you hit OK, the csv layer should be listed as a Join layer, which means we can turn our calculated numbers into visuals. The map itself remains unchanged.

We are going to transform it into something like a heat map - colors on the spectrum representing the landowner’s tax expenses per area of property. In this same property window, go to the Symbology tab, and from the dropdown on the right, select the “Graduated” option. On the Value field, you could click the epsilon on the right in order to help you find the variables that you need, but for simplicity’s sake, I will just tell you what to put in for the Value field:

1
propertytaxlist_totalTax / Shape_Area 

We will be categorizing the lots into “classes”. So in the “Classes” field in the bottom right, let’s change it to 30, then click the button “Classify”. It might take a while for the program to classify all the lots, but when it’s done, you should have a classification system like this one on Figure 11.

Figure 11 - Lots classified into bins.

And before we finish here, let’s modify the color ramp. Go ahead and click on the color ramp, and I’m going to put in these values for my gradient, with the color ramp shown below:

Figure 12 - Color ramp.

Once you’re satisfied by the color ramp, the last thing we’ll fix is to remove the outlines around the parcels. Select ALL the class entries created, then click on the color block next to where it says symbol and click on the option that says “Simple Fill”. Next to the option that says “Stroke style”, we are going to change it to “No Pen” like shown on Figure 13.

Figure 13 - Select "No Pen" if you do not want an outline around the property boundaries.

Once you hit “OK” on the prompt, then “Apply”, your map should then be colorized. Since we’re going to compare lots between municipalities, we could include municipal boundaries as well. You can find its shapefile here. I’ll leave it for you to figure out how to stylize it. The boundaries are shown in the tax heat map shown below.

Figure 14 - Heat map of property tax billing per acre in the Greater Pittsburgh Area.

Click here to download the interactive map of Figure 14 for QGIS.

What we just did - we created a heat map representing the landowner’s tax expenses per acre of all properties in Allegheny County. You can explore the map, or look at the one I created, and see if the boundaries between municipalities create a “scar” exposing disparities. (Again, school tax excluded.)

You can take a little bit of time to really look through the heat map shown on Figure 14! My legend on Figure 14 is shown on the bottom left. Note that the last four entries on the legend - Green-blue, Blue, Indigo, and Violet, are incrementally the most substantial. I will proceed with the analysis of three areas of interest. There’s definitely more that can be discussed about this topic, and that’s why I broken this blog entry into multiple parts.

Analysis #1 - Pittsburgh-Swissvale-Wilkinsburg

Along the corridor of South Braddock Avenue is a somewhat-upscale neighborhood known as Regent Square, which straddles the border with Swissvale and Wilkinsburg. But honestly, even for native Pittsburghers, it’s difficult to know where the end of these city limits are, because these neighborhoods almost perfectly blend together. The fascinating thing about this area is the fact that the city boundary cuts diagonally through rowhouses. In terms of tax millages, Pittsburgh charges 8.06 mils, while Swissvale charges 8.00 mils, and Wilkinsburg charges a staggering 14.00. However, this is offset by the fact that Wilkinsburg’s property value is generally lower than the other two neighborhoods.

Figure 15 - Top left: Google 3D aerial view of the Regent Square area at the border of Swissvale and Wilkinsburg. Only the decision to pave over some brick roads stands out. Top Right: Heat map representing annual tax billing per square feet. Nearly all properties shown fit in the upper echelon in terms of value per square foot, and all three municipalities charge similar taxes. Bottom: Street view of Milton Street, looking from Swissvale towards Pittsburgh. Note that there is no noticable difference in terms of neighborhood quality at the street level.

Analysis #2 - Brentwood-Whitehall

Both Brentwood and Whitehall are adjoining municipalities of about the same size; and therefore are great candidates for comparing public works. Whitehall’s tax millage of 4.42 is less than half of Brentwood’s rate of 10 mils, and Whitehall has a larger share of single-family low density housing like a suburb, as opposed to Brentwood’s traditional urban neighborhood. We can assume that one borough will have less funds than the other, but how different are they in reality is the question. You can see in the heat map below, in Figure 16, that Brentwood has a larger funding pool.

Figure 16 - Tax revenue heat map of Whitehall (left) and Brentwood (right). Note that Brentwood is significantly smaller than Whitehall.

Well, by PA state law, all municipalities are required to have a commercial district, so let’s look there.

Keep in mind that when you’re looking for particular elements, these are particular ones maintained mostly by the local authorities. Such public works may include:

  • Street trees
  • Actual maintenance of brick and concrete pavements.
  • Parks with good amenities such as a tennis court, large playground, ice rink
  • Clean Roads, if in the city
  • Decorative streetlights
  • ADA-compliant ramps with warning detection strips for the blind
  • Outdoor artwork
  • Bus shelters with sidewalks and crosswalks

Take a look at Figure 17 to see what is missing in Whitehall’s shopping center on Baptist and Grove Road. There may be less appeal to aesthetics such as street trees compared to Brentwood’s, but I take particular offense at the lack of safety at the bus stop. The bus system is managed as part of a countywide public service. The lack of interest for the township to come around and provide safe street crossings may be a local problem, but it’s one that affects outside commuters that are there for work, even if said commuters came from a borough that charged higher taxes into a good pedestrian network.

Keep in mind when you’re looking at Figure 17 - Baptist Road itself is a State Route, meaning its maintenance is the responsibility of PennDOT rather than the local authorities. PennDOT roads are funded by gas taxes, not real estate tax revenue or municipal funds. Local authorities can in any case work with PennDOT to add pedestrian crossings, sidewalks, and other public works inside their route, but those improvements come out of the borough’s pocket. (For future reference if you don’t know - most minor State Routes throughout the U.S. can be distinguished from local roads by their white shoulder stripes, double-yellow centerline with rumble strips, and general lack of waves or ripples on the paved surface).

On a similar matter, Brownsville Road in Brentwood is a particular type of local road that is subsidized by gas taxes, but is still owned and managed by the local authorities.

In any case, when you look at Whitehall’s infrastructure compared to Brentwood’s, you realize that you probably got about what you paid for. Nonetheless I also suspect that Brentwood may not be as efficient with their funds as other municipalities. If you’re interested, you can review their financial reports on their ClearGov profile, though I must warn the reader, some municipalities have been known for giving less accurate figures on the individual details.

Figure 17 - Brentwood's shopping center vs Whitehall.

Analysis #3 - Dormont

As shown on Figure 18, the entire boundary of Dormont has one of the most noticeable disparities. From first impressions, it almost looks like the amount of their revenue translates into an equally pleasant and strong neighborhood, as this borough does have a particularly nice shopping center on Potomac Avenue. However, even that area only spans three blocks.

Figure 18 - Dormont Borough.

But let me show you something. In Figure 19, here’s a Dormont residential area straddling Mount Lebanon - an area that only charges 4.41 mils - less than half - in municipal taxes compared to Dormont’s 9.97 mils. This is the intersection of McFarland Avenue and Dell Avenue. Despite one side of the street being similar to the other, one side charges significantly higher in taxes than the other. The general character and upkeep of the neighborhood remains identical.

Figure 19 - Dell Avenue and McFarland Road. The boundary between Dormont and Mount Lebanon follow the sidewalk at the edge of McFarland Avenue. Isn't that obvious?

In cases like this one, one may suggest the plausibility of the borough being in financial distress for a long time, creating a delay before any major public works projects go into effect until the borough’s council members determine that they have a comfortable amount of capital. The lack of a clear understanding leads my intuition to find their financial statement/audit for 2021.

But here’s the kicker - I looked for this 2021 report and, what is odd is that, it’s not on their website. In fact as of writing this article, and despite it being a requirement by state law, no financial audits are available from 2017-2023. I did notice at least that their 2016 report lists their real estate revenue of $3,800,447. That number is pretty close to our 2021 calculation of $4,104,937.22. Their gross revenue is even higher than Brentwood, yet Brentwood is almost double the acreage of Dormont. Where are the rest of the reports?

While I was looking, I stumbled upon something else that I wasn’t expecting. It turns out with Dormont, there is a bigger story. I found this link on their site - a forensic analysis on suspected fraud.

“Wire transfers and electronic payments could be made by Bookkeeper without review by other Borough employees. Bookkeeper had both electronic payment authorization tokens and could make transfers without approval. No oversight of ACH
payments.”
- Dormont’s Forensic Analysis on Suspected Fraud, Finding No. 6

Apparently the borough found $300,000 of undeposited checks at the bookkeeper’s home that worked there for 27 years before getting fired in June 2022 and was found using the borough’s credit cards for that individual’s expenses. Even 300 grand is pretty substantial in comparison to a borough of that size. In some boroughs, that may be half an entire infrastructure budget. This is an ongoing investigation and therefore we won’t really know the full scope of the fraud, but from the surface, it’s really difficult to look at it any other ways than a scandal that shortchanged the taxpayers.

I also found, after this discovery, that the state publishes the financial statements provided by every local government in the state. You can find these reports here. But since there’s an ongoing investigation on financial fraud in this borough, obviously that’s going to cast doubt on the credibility of the reported figures.

Final thoughts

The tax revenue that we analyzed goes mainly to pay for infrastructure maintenance, a portion of public safety (police), and administration pay. So that raises the question, what actually causes these disparities even if some parts inside the borough lower taxes appear to be in pristine shape? We will explore that in the next chapter of this two-part analysis.

This post was inspired by the rising popularity of urban infrastructure and revenue analysis by projects similar to Urban3 and popular YouTube channels such as City Beautiful, Not Just Bikes, and many other groups that popularized the discussion of neighborhood sustainability.