Publications

Discussion Papers

Ozone Standard Exceedance Days in the South San Joaquin Valley

Jan de Leeuw and Shuojun Wang

This note analyzes ozone standard exceedance days in Bakersﬁeld, Arvin, Oildale, and Shafter since 1989. We have plotted violations of both one-hour and eight-hour federal and state standards (data from CARB monitoring stations). We have also drawn linear regression lines to estimate trend. They show a modest long term improvement (around 20%) in violations of one-hour state standards, but generally a smaller increase in violations of eight-hour state standards. They also show larger improvement in urban Bakersﬁeld than in the rural areas.

Principal Component Analysis of a Finite Number of Curves

Jan de Leeuw

We introduce and discuss Principal Component Analysis (FPCA) of curves, using only relatively simple matrix algebra and optimization results. Diﬀerent loss functions, and various ways to impose constraints on the solution, are also discussed. The techniques are applied ﬁrst to some theoretical examples involving smooth curves, and subsequently to a data set with one year of hourly ozone measurements in Lebec, Kern County, California. R code for all functions, tables, and ﬁgures is included for reproducibility.

The Lebec Air Monitor

Jan de Leeuw

Data collected between February 2006 and February 2007 with an O3 and PM-2.5 monitor in Lebec, California are analyzed. Extensive analysis is not possible, because of the short timespan, but we give descriptive statistics, mostly as plots. The influence of wildfires on PM2:5 and of I-5 truck traffic is discussed briefly. It is noted that there are two schools in close vicinity. In order to get more information about longterm developments, including prediction, more extensive and systematic monitoring is necessary.

Month, Weekday, and Hour Effects in the Lebec Air Monitor Data

Jan de Leeuw

Data collected between February 2006 and February 2007 with an O3 and PM-2.5 monitor in Lebec, California are analyzed. In this paper we analyze the data using simple least squares imputation, and an additive main effect model to look at the effects of mont-of-the-year, day-of-the-week, and hour-of-the-day.

Dissertations

Separability Testing for Point Processes with Covariates and An Application to Wildfire Hazard Assessment

Chien-Hsun Chang

In modeling marked point processes, it is typically assumed that marks are separable from the spatial-temporal coordinates. Tests have been proposed in the simple marked point process case to investigate the separability of the mark distribution. These tests are here extended to the case of a marked point process with covariates. The extension is not trivial, and covariates must be treated in a fundamentally different way than marks and coordinates of the process, especially when covariates are not uniformly distributed. Solutions are proposed to the problem of how to proceed when the separability hypothesis is rejected. An application of separable marked point process models with covariates is given to the assessment of the Burning Index in predicting wildfire activity. An examination of the Los Angeles County data reveals that the Burning Index predicts poorly compared to simple alternatives using just a few weather variables.

Theses

A Time Series Analysis of the Paciﬁc Decadal Oscillation

Nicholas S. Nairn-Birch

The Paciﬁc Decadal Oscillation (PDO) index deﬁnes the leading mode of monthly sea surface temperature anomalies in the North Paciﬁc Ocean. Time series analysis in both the frequency and time domains is applied to 107 years of monthly PDO index values. Simulations of a model ﬁtted to the data are examined for the occurrence of particular event seen in the raw data; a probability for this event is calculated. The simulations are further used to tabulate histograms for mean length (in years) of a positive phase, and the absolute diﬀerence between the longest positive and negative phase (in years). The results show that the probability of occurrence of the event in the raw data is relatively low (9.9%). The raw data’s mean positive phase length is close to the simulation mean and median, while the absolute diﬀerence in maximum positive/negative phase lengths corresponds to a p-value of 14.9%.

Analysis of Average Annual Traffic Count on the Interstate 5 Project

Angela Chang

Over the past years the volume of traffic rushing over the freeways of the Los Angeles area has increased, as evidenced by the longer commute time and the seemingly endless hours people spend in traffic jams. These hours spent sitting behind the wheel of a stopped car represent lost and wasted time. A driver is repeatedly switching radio stations trying to determine a route out of a traffic snarl. The mere anticipation of a traffic jam can lead people to leave for work an hour earlier than necessary in an attempt to avoid such time-consuming problems, rather than making breakfast for the family or walking children to the bus stop. Imagine these scenarios played out every morning and afternoon for thousands of drivers across the Los Angeles area. The result is the loss of hundreds of work hours and depleted recreational and family time. A society that takes appropriate steps to remedy such a detrimental situation cannot help but improve itself overall. Towards this goal, this paper will examine the analysis thus far of the Average Annual Traffic Data provided by the California Department of Transportation. First the data and the manipulations to the data will be described. Then trends in the data will be examined, after which appropriate models will be examined to describe the increase in traffic volume seen over the last decade and a half. Such a model would be a useful tool in focusing limited transportation resources by attempting to predict future traffic growth patterns based on past observations. The purpose of this report is to act as a stepping stone for further analysis of traffic growth patterns on the Interstate 5 freeway. As this is an ongoing research, it is in hope that this paper will help further understanding of the data.

Geographic Estimation of Chumash Towns and Test of Randomness using Bootstrap

Yong Fu

This report presents a method of estimating the longitude and latitude of Chumash towns at the time of European settlement by using software ArcGIS. We also employ bootstrap to test the randomness of location of fifteen Chumash towns which contain mitochondrial DNA (mtDNA) information.

Impact of Weather Covariates on Wildﬁre in Tanjung Puting National Park

Esa A. Eslami

This paper explores wildﬁre modeling based on meteorological variables for Tanjung Puting National Park, located on the island of Borneo. Based on the point process models developed in other papers to describe wildﬁres in Los Angeles County, a separable, or entirely multiplicative, model is developed and each individual component then estimated using kernel smoothing and parametric estimation. Bandwidth selection and tests for separability are explored as are problems that may arise due to the specific data.

Model Application to Air Pollution Data of SCCX

Kuei-yu Chien

Research on the variation in Ozone concentration contains the analysis of its changing in the time. We will present a detailed discussion about how to apply the seasonal ARIMA (SARIMA) model, also known as seasonal Box-Jenskins approach, to the daily average maximum ozone data extracted from the California Air Resource Board, and based on the fitted model, we will forecast the level of ozone in the years to come. The SARIMA model is parsimonious enough as well as sufficient to describe non-stationary data such as the underlying ozone data. We firstly reduce a stochastic series to a stationary series and applie the MA and AR model to make further illustration of the data; considering the seasonal pattern occurring in the original data, we may as well use the seasonal ARIMA model to adjust the seasonality and improve the results of our forecast. We will examine our model by three methods: the model checking process; comparison with other possible models such as the non-seasonal ARIMA model and the mixed ARMA model, and applying the in-sample prediction concept and then comparing with the actual values. After conducting the three method, we can reach our conclusion that the SARIMA model provides a good description of our data.

Santa Barbara County Ambulance Response Performance Under Load

Joshua Chang

Santa Barbara County ambulance dispatch data from 2006 was analyzed to determine the effect on performance due to system load. A variable called ‘neighbors’ (η) was calculated as a measure of location specific system load, where η is the number of ambulances that were dispatched within the previous hour within a fixed distance of 20 kilometers. It was determined that calls for which there are neighbors have a statistically higher proportion of response time violations than calls for which there are no neighbors. It was found that on average, the odds ratio of the probability of a response time violation increases by 19.1% for each additional neighbor. However the fact that system load degrades system performance may not itself be very helpful. It was determined that the effect due to system load is not homogeneous in space. The inhomogeneous K-function for locations of calls where neighbors corresponded with a response time violations indicated that there is more clustering compared to inhibition than expected under the hypothesis that these points were drawn without bias from the overall spatial call conditional intensity. This paper identifies regions where system load most impacts ambulance response performance.

Time Series Analysis of Air Pollution CO in California South Coast Area, with Seasonal ARIMA model and VAR Model

Xiao Han Cai

A seasonal integrated autoregressive moving average (SARIMA) model and a vector autoregressive model (VAR) were applied to modeling the time series of monthly maximum one hour carbon monoxide (CO) concentration in California South Coast Area. The SARIMA model presented how the current month air pollutant concentrations depended upon the previous months air pollutant concentrations. Prediction was made by fitting a model. The VAR models showed the association between the current month CO concentrations and the meteorological covariates including precipitation, temperatures, solar radiation and miles of vehicle travel. Through analysis of the models impulse response function, and forecast error variance decomposition it is demonstrated that: (i)Precipitation does not seem correlated with CO concentration . (ii) Environmental plus traffic elements do exert a long-run effect on CO level. (iii)Spurious Regression is an important problem for further data analysis.

Time Series Analysis of Air Pollution in the City of Bakersfield, California

Shuojun Wang

An autoregressive/moving average (ARMA) approach and a multiple linear regression (MLR) approach were applied to modeling the time series of monthly maximum 1 hour ozone, carbon monoxide, and nitrogen dioxide concentrations in the city of Bakersfield in the San Joaquin Valley Air Basin. The ARMA model presented how the current month air pollutant concentrations depended upon the previous months air pollutant concentrations. The MLR models showed the associations between the current month concentrations and the meteorological covariates including precipitation, temperatures and solar radiation. By comparing these two models, we concluded that the ARMA models are slightly better in fitting the air pollution data than the MLR models.

Analysis of Interstate Highway 5 Hourly Traffic via Functional Linear Models

Napat Buddhangkuranont

This analysis was conducted to analyze the traffic count on the Interstate Highway 5 (I-5) at three different locations in the North of Los Angeles. A functional data analysis was applied to the hourly data to identify the potentially similar traffic patterns among the days of the week and also to locate that on what day and at what hour have the most cars on the freeway. At various locations, the analysis suggested that the traffic count seems to have different pattern. The analysis of the traffic count from the north bound of the 3 locations of the I-5 and the south bound of Smokey Bear seems to suggest that Monday, Tuesday, Wednesday, and Thursday fairly have comparable pattern. While the analysis of the traffic count from the south of State Route 126 West and Wheeler Ridge suggest that only Tuesday and Wednesday seem to have similar traffic pattern. Moreover, Sunday seems to have the most vehicles count among days of the week. At 3pm seems to have the most congested traffic.

Time Series Analysis of Particulate Matter 2.5 in the San Joaquin Valley

Judy Yang Hee Kong

The San Joaquin Valley (SJV) of California is one of the most polluted regions in the United States, and is classified by the U.S. Environmental Protection (EPA) as a serious nonattainment area for both ozone and fine particulate matter (PM2.5). Pollution sources in SJV accounts for about 14 percent of the total statewide criteria pollutant emissions. The Valley ranks the second with respect to the nations’ worst ozone air quality. Although there has been some progress in SJV over the past ten years, the rate of progress has been slow in comparison to other areas of the California State. ²³

To bring the entire Valley into attainment of the PM2.5 standard, it is necessary to find out past and present movements of PM2.5 to estimate the future movements of PM2.5. In order to make it possible, the first and foremost step to take is to do time series analysis of this matter, so it could be used to plan for the future in reducing PM2.5 to attainment standard. Basic time series analysis methods, such as plotting raw data and decomposition, were used and various additional methods, such as ACF, ARIMA, and regression analysis, were applied.

Center for Environmental Statistics

Discussion Papers

Dissertations

Theses