Array of Things

We are very grateful for the valuable input provided to us by the public. Below, we respond to specific comments and outline how they have influenced the final version of the governance and privacy policies. Comments regarding punctuation, formatting, grammar, and other non-policy issues are not included here, but we appreciate those suggestions as well and in most cases have adjusted the document to address those comments.

For updated information on the Array of Things, visit our homepage.

How many prototypes installed in Summer 2016?

We expect to install 40-50 in Summer 2016, increase to 80-100 by end of 2017, and add 200 more in 2017 and in 2018.

Presumably a method for doing this [suggesting node locations] will be required in the public web site.

We have created a form to suggest node locations and/or research ideas, and added a link to the front page of the AoT website (www.arrayofthings.us).

Can more sensors be placed in the north near O’Hare?

A form has been created for interested people and community groups to propose locations. The form is available on the front page of the AoT website (http://arrayofthings.us). (See also the policy documents, which outline selection criteria and process)

Can a homeowner elect to have a sensor installed on their property?

That is technically possible but the project is not able to engage homeowners at this stage. We hope to revisit this in 2017.

Has the national weather service shown interest in this work?

Not yet, but Array of Things has interacted with the EPA and other federal agencies.

What will you do about clogged optical lenses or sensors?

We anticipate facing this challenge in the initial node deployment. Lots of things can lead to clogged sensors, including birds, insects, dust and ice. We are evaluating the use of a Gortex sock around the air quality sensors to reduce ice and dust exposure. As with many other environmental challenges to the node technology, we will learn from analyzing and observing the performance and any challenges with the first round of nodes and make changes to improve reliability of subsequent nodes.

Tell us more about the partners involved in this work - specifically the SAIC & Smart Chicago. Why are they involved?

The School of the Art Institute of Chicago (SAIC) has been involved in the physical design of the nodes since the early conception of the project. Faculty members from SAIC also worked with the AoT project and faculty members at Lane Tech High School to develop and pilot a high school curriculum on Internet of Things and sensor networks.

Smart Chicago Collaborative is assisting the project with community engagement, enabling the project to leverage their experience in this area.

The National Science Foundation funding program required all proposals to include cost sharing, and the project received engineering support from six companies (Cisco, Intel, Microsoft, Motorola Solutions, Schneider Electric, and Zebra Technologies). In early 2016 the University of Chicago expanded its longstanding AT&T partnership to include the Array of Things, making AT&T the seventh industry partner.

Scientists from over 30 universities and national laboratories have been active in workshops and project planning for the past three years, and will continue to advise the project and to leverage the data to better understand urban systems.

When will the Lane Tech Curriculum be available to everyone?

We are currently seeking funding to package the curriculum developed in 2016 for Lane Tech and are hoping it will available for other schools in 2017. We are also working with Microsoft to create a web portal where this and similar curricula can be made available along with AoT data.

Specify what copyright, or a list of possible copyrights, the data will be made available under.

The data produced by the Array of Things will be published with terms of use similar to those adopted by other open data services. These will be based on the MIT license terms, adapted to apply to data. Information on the MIT license can be found at https://en.wikipedia.org/wiki/MIT_License. The National Weather Service also provides a good example of the terms of use that Array of Things will adopt. http://www.weather.gov/disclaimer.

Would like to see an open nomination process for some percentage of seats [on the Executive Oversight Council]. Yes, it's Chicago, but we're aiming for a _better_ Chicago.

The program operators will be working with the co-chairs to develop a process for open nomination and for ensuring an appropriate balance of stakeholders in the council. The ultimate decision on council members will come from the co-chairs.

It could be nice to see more detail here about how requests to change the software will be evaluated. It's not too hard to imagine privacy issues coming up here.

The software used by the nodes is open source, enabling community-based testing, input, and review. Responsibility for accepting software changes and for managing the software versions running on nodes lies with the project team, all of whose work is governed by ethical and privacy agreements, including the project governance and privacy policies. The review process includes checks for compliance with privacy policies and goals as well as stability, security, and other factors.

Avoid language of "data ownership” - data cannot legally be owned (in the United States). This is not just cosmetic, it's important not to introduce a legally indefensible concept into a document that will (we hope) be a binding understanding of how the AoT will work. The idea that data will not only be owned by someone, by the University of Chicago particularly, adds a level of political sensitivity that is unnecessary and possibly counterproductive.

Data ownership is standard legal language. By “ownership” we mean the University of Chicago will copyright the data and have responsibility to ensure that the data remains open and available through the course of the project and beyond, and that any data publication is done within the rubric of the privacy policy. To avoid confusion or unnecessary debate we have modified the text to use the word “stewardship.” The terms of usage will be similar to what is commonly used for open data such as is available from the National Weather Service or other open data sites. These terms will be based on the MIT license terms, adapted to apply to data. Information on the MIT license can be found at https://en.wikipedia.org/wiki/MIT_License. The National Weather Service also provides a good example of the terms of use that Array of Things will adopt. http://www.weather.gov/disclaimer. These terms of use will be published along with the data.

Add a provision to pull the plug. "If a regular evaluation determines that the AoT is unable to meet the goals of the program, or if the program is producing a preponderance of adverse effects, it may be discontinued." (or the like) The public may well be scared at a new level of surveillance/coveillance and reassurances that misuse of the data will be stopped will go a long way towards encouraging acceptance.

The regular evaluation of the project includes the possibility for discontinuing the project. Additionally, the National Science Foundation will review the project annually. The project’s funding ends in 2018, after which all parties will review the project and determine whether to extend it. As detailed in the privacy policy, the devices will not collect personal information nor provide any other surveillance capability.

Who approves the experimental sensors and what criteria will they be using to decide what/who gets approved? Will experimental sensors be allowed to collect PII and store that information on private servers?

Other than several candidate environmental sensors such as rain or carbon dioxide, no new or experimental sensors are expected prior to the first annual review of the project and policies.

Experimental sensors proposed by parties outside the AoT project team will be reviewed by the project team (system operators) and the Scientific Review Group. If the sensors have any potential to detect personal information then a plan must be included as to how their use will comply with the privacy policy, and in these cases that plan will also be reviewed by the Technical Security and Privacy Group. As part of the agreement to test new sensors, outside parties must agree that once the sensor research/evaluation is complete then all validated data stored at Argonne will be published openly.

What about groups or individuals who do not fall under any of these categories? E.g., a volunteer group that is NOT a not-for-profit.

We have added an “including but not limited to” clause to stress that we are seeking partnerships with additional entities not listed here.

What about under subpoena or warrant? [access to images or sound]

The only data that will not already be public is a limited set of images used to improve computer vision software. These images will contain no sensitive PII, but some may show faces or license plate numbers. The University of Chicago, as copyright holder of the data, would be responsible for responding to law enforcement requests. The University of Chicago policy for all external requests (UChicago Policy 1007) can be found here.

The more policies AoT revises policy, and the more of itself it gives, the more daunting the public review task becomes. We all know there's a bright future for AoT, but more imagination towards shaping policy that empowers people to interact with our shared picture of the urban system must occur.

We agree and will continue our emphasis on public engagement and transparency, welcoming additional imagination and input.

What happens if in a couple of years - we want to look into gun violence prevention, will you program it to look at shootings/firearms? If you add this type of data or any other forms of data that is not currently captured, what is the process of adding (or removing) programming?

As outlined in the published privacy and governance policies, new sensors or data collection capabilities will undergo multiple review processes including technical feasibility, scientific merit, and implications with respect to privacy and compliance to the privacy policy. The project website will be updated to include any new sensors or capabilities if they pass these review processes. Similarly, if capabilities are removed from the devices this will be noted on the website and elsewhere.

Not saying "camera imagery" is misleading. The data that will be gathered IS IMAGES. Pedestrian and vehicle movement information will be inferred from that data. And it is absolutely 100% certain that unless this policy says that imagery will only be used for pedestrian and vehicle movement, then it WILL be used for something else. Using data in creative ways is exactly what data scientists get paid to do. This absolutely must be rewritten.

This wording was not intended to mislead—in normal language we refer to images with the understanding that they come from cameras.

The wording, “This includes, but is not limited to…” makes it clear that the measurement of pedestrian and vehicle movement are examples, and that we expect new image processing applications to be developed. The purpose of the document is to outline policy and processes to protect privacy while supporting new image processing capabilities not listed in these examples (for instance, detection of standing water or ice on the roadway). This includes a process for reviewing the potential privacy impact of any data published by new image processing capabilities to ensure that they are compliant with the privacy policy.

This paragraph should contain two separate lists. One list describes what data is collected (camera images, raw audio, vibration data, temperature, etc.), and one list describes what data is shared, including the derived features such as pedestrian/vehicle counts.

The policy documents include reference to the project website, where the project maintains information on the specific capabilities of the nodes. The website currently has this information, including a list of initial features/information to be extracted from images, in the node configuration schematic.

In addition, the technical specifications and data sheets for each sensor on each node will also be published with the data.

The suggestion to provide tables is a very good one and thus this data will also be formatted into tables for clarity.

This is a concerning piece of wording and implementation of this proposal. This makes me have to ask about the specific management rules of these images - who has access, how long will they be stored, and how do they get deleted? If these images are never deleted, then the entire PII section of this document is void from a technical perspective. With enough images taken over time, one can find an individual based on their clothing, follow them through each image, and eventually determine where they work and where they live. From there, it's pretty easy to figure out the rest of that person's identity. Blurring out images and license plates is not enough. To me, I think it would be better if a smarter solution could be implemented to where images are not even needed for these metrics (i.e. traffic patterns). I don't know what that solution would be, but I'm more afraid of the potential of future harm to be done with these images more than anything.

The policy document has been updated to clarify that image processing for the street-facing cameras will be done on the nodes themselves, and the images will then be deleted - not saved or transmitted. For calibration of image processing software, a fraction of 1% of images will be randomly saved. This limited set of images will contain no sensitive PII. Some may potentially show faces or license plates, and while these are not considered sensitive PII the project has elected nonetheless to limit access to those images to approved individuals who have signed usage agreements, as outlined in the published privacy policy document.

Although most citizens may not care, the technically minded ones would be interested in knowing exactly how this data is secured and encrypted. There are different ways of doing so and being transparent about that is important IMO.

All data transmission is encrypted via OpenSSL. Although none of the limited set of calibration images (the fraction of 1% not deleted in the nodes) will contain sensitive PII, the methodology for securing calibration images will follow NIST guidelines for control, audit, and protection of sensitive data, for example using AES-256 or similarly strong encryption methods. The open source software stack is also published in Github (waggle-sensor) where the project encourages the technical community to comment and contribute to its evolution.

Will all raw data that is collected be uploaded to the secure facility? Will some of the raw data be deleted on-site after processing?

All sensor data, with the exception of the limited set of calibration images from the street-facing cameras, will be uploaded to an open data system operated by the University of Chicago and Argonne National Laboratory. This data, containing no PII, will be retained for future use indefinitely. In normal operation, images will be processed on the node and deleted within minutes. For calibration, a fraction of 1% of images will be maintained and protected at University of Chicago. If and when the Array of Things project concludes, these images will be either processed to remove possible identifiers (faces or license plate numbers) or deleted.

Who is in charge of this project?

The project is co-led by Brenna Berman, City of Chicago Chief Information Officer, and Charlie Catlett, Senior Computer Scientist at the University of Chicago and Argonne National Laboratory.

The City is responsible for installation of the nodes and oversight of the program

Catlett is responsible for the creation and operation of the hardware and software systems, with oversight from the University of Chicago, Argonne National Laboratory, and the National Science Foundation. Berman and Catlett will co-chair the Executive Oversight Committee, with participation from other organizations and stakeholders. Catlett is also the leader of the “Program Operators” group defined in the governance document.

Is the camera used for public safety purposes?

Array of Things does not have a law enforcement component; it is designed to collect and publish data about the city’s environment, infrastructure, and overall activity. Where it intersects with public safety are efforts to improve traffic safety; for instance, using pedestrian, bicycle, and automobile traffic counts at busy intersections to develop safer streetlight patterns or crosswalks.

Why did the first community meeting happen in Pilsen?

Several organizations and individuals approached the project with interest in air quality and related health issues such as asthma, including some with concerns about factories and expressway emissions. After learning that the Instituto del Progresso Latino, located near factories and the Stevenson Expressway, focuses on healthcare careers we met with their leadership to explore their interest in air quality data and engaging students. The Pilsen neighborhood also has a history of community organization around improving air quality, and nodes in this part of city [Pilsen] will help to better understand air quality impacted by factories and expressways.

Is this happening in other places or cities?

The technology platform used by the Array of Things has been installed in several environmental test areas such as the Indian Boundary Prairie National Natural Landmark in partnership with Northwestern University. Additional environmental science projects have contacted the project about similar installations.

Array of Things has also been approached by over 60 cities and universities worldwide with requests to support pilot projects in their cities.

How might the project/sensors change?

Selection of the current set of sensors was driven by input from scientific workshops with scientists from dozens of universities. There are some sensors that we expect to be able to add in the near term, such as precipitation, wind, and carbon dioxide, but that are either too expensive or otherwise impractical for the first set of installations. We expect that these and other new sensors will become practical in 2017 and beyond, and hope to integrate them into the platform.

You mentioned a hypothetical about counting dog walkers? Could you potentially catch people who don’t pick up after their dogs?

While it may be technically possible to use image processing to detect complex situations such as this, the project is not intended to be used for surveillance or identifying any individuals in any way.

What’s the purpose of collecting the nonpublic raw data/images?

There are several ways that these images are used for developing and improving image processing software. To illustrate, consider software intended to count bicycles. First the software must be “trained” to recognize bicycles by having the software process a set of images of bicycles. Second, the software must be tested and improved to perform under various conditions that might impair the image, such as different lighting and weather conditions.

Why collect multiple images at different times?

As noted above, images at different times provide a diversity of lighting and other conditions with which to train and test the image processing software. Different times also provide different patterns of street activity, such as can be seen with traffic from hour to hour.

What about measuring cancer-causing chemicals?

One sensor being installed will measure very small particles (particulate matter as small as 2.5 micrometers, or “PM 2.5”), some of which have been linked to cancer and other health conditions, such as asthma. As sensor technology and costs allow, the project intends to continue to add sensors, with particular interest in those related to health.

Can communities influence placement of sensors?

Yes, Array of Things welcomes community suggestions for the placement and use of sensor nodes, and provides a form on its website for these submissions. For instance, a community member may suggest moving the sensors one or two blocks over for a particular reason that the Array of Things team had not considered, or did not know about.

Will all of the sensors be placed at the same height?

The sensors will be on traffic signal poles so that they can use the 24/7 power source. Most of these will be roughly 20 feet above the roadway, but the project is also exploring additional future sites such as building exteriors and rooftops, as this height variation will be important for a more full understanding of how various pollutants flow through the city.

Why aluminum for the nodes?

The rectangular electronics enclosures are made from aluminum in order to draw heat away from the computers inside. The white sensor enclosure (resembles a beehive), is made from molded plastic.

What kind of computers are in the sensors?

The computers inside the nodes for this year’s deployment are called “Odroid,” which are single board Linux computers similar to the Raspberry Pi. They use the same processor as is used in many smartphones.

Isn’t there a research trade-off between having the sensors placed around the city randomly vs. strategically?

Depending on the measurement goal, you don’t necessarily need sensors placed everywhere or evenly. Observing lake effects vs. traffic vs. air quality data each require different strategic configurations and locations of the sensors.

Can law enforcement authorities require you to store data you wouldn’t have done?

Any requests from law enforcement or other government authorities would be handled by the University of Chicago Legal Counsel.

What will you do about clogged optical lenses or sensors?

We anticipate facing this challenge in the initial node deployment. Lots of things can lead to clogged sensors, including birds, insects, dust and ice. We are evaluating the use of a Gortex sock around the air quality sensors to reduce ice and dust exposure. As with many other environmental challenges to the node technology, we will learn from analyzing and observing the performance and any challenges with the first round of nodes and make changes to improve reliability of subsequent nodes.

What independent body audits and controls deletion of data?

The Technical Security and Privacy Group will be asked to audit the project at least annually as part of the annual review and reporting process. The project team is responsible for managing data in compliance with the privacy policies. Note that the nodes cannot detect sensitive PII, only images of the public way, which may or may not include faces or license plate numbers. The privacy policy outlines how the project will minimize the potential detection of even this non-sensitive PII.

Would Array of Things data result in isolating/segregating/overemphasizing certain areas of the city--particularly if it’s sliced and diced?

The goal of the project is to measure and understand how cities work in general, rather than focusing on a particular street corner or neighborhood in isolation. This will involve doing experiments in some locations in order to learn principles that can be applied to other, similar, locations. Combined with these experiments the project intends to cover the city with sufficient sensors to see the big picture as well as the details of particular neighborhoods. The selection of sites is also based on interactions with Chicago communities, residents, and other local groups rather than through a top-down decision process.

With the recording of PII, will we be able to make out specific things in the pictures? Who is going to have access to this data? For example, the NSA?

Images of the public way are not considered sensitive PII. Rather, such images are considered to be public information such as can be found in phone books or on public websites, including some features in images such as faces or license plate numbers. This type of information is often referred to as “non-sensitive PII.”

In some cases it will be possible to detect a face or license plate, but the image processing software approval process prohibits the extraction of these or other identifiers from the images. Access to the images, whether those discarded within the nodes or the fraction of 1% that are saved for calibration, will be limited to individuals who have (a) demonstrated scientific need and (b) signed acceptable use agreements.

If the images chosen to train the cameras are random, what value does that have to a scientist?

The value of images to training image processing software is not diminished by randomness.

Wouldn’t it be cheaper just to ask the neighbors if there’s standing water than to have a sensor?

No. The City of Chicago does receive 311 calls for flooding but those calls are not always about standing water in our streets. Additionally, the detection of standing water at the same time as measurement of environmental conditions such as temperature, precipitation, and wind speed will allow researchers to study the causes of urban flooding, and potentially provide insights that allow city departments to anticipate and prevent flooding before it occurs.

What’s the process for addressing issues that sensors might detect? Where I work, people don’t call 311 because nothing happens.

The project data will provide both the public and the city with data that can drive large-scale changes. The city’s interest in sensors is to enable more effective response to various events and conditions and to develop methods for using data to predict and thus more efficiently respond to those conditions.

How is the internet part of the device protected?

The devices employ a variety of common protective measures, including the use of encryption of all communication and the use of a “virtual private network” that limits access to a pre-approved set of computers. The nodes also do not support access initiated from other computers. All communication with the nodes must be initiated by the nodes themselves, which are pre-configured to only connect to the trusted server at Argonne National Laboratory.

Does the cellular company have access to the data?

No, all data is encrypted so that cellular company has no way to view it. It would also be illegal for the cellular data company to attempt to do so.

Are the algorithms for image recognition going to be publically available in a repository?

There will be a repository of algorithms. In most cases they will be open source. The software frameworks supported on the nodes will also be commonly used libraries and tools, thus the algorithms published will be readily reused on other systems.

Will the funding continue for Array of Things if it’s successful?

The program that supports the Array of Things is typically extended if the National Science Foundation determines that the project has scientific merit and serves the scientific community. We believe both of these things will be true of the project.

Are example data sets available?

There is some data from early sensors at the University of Chicago, which only collected environment data such as temperature, humidity and vibration. Units similar to those being installed in Chicago have also been operating at test sites operated by Argonne or universities in non-urban settings including rooftops and grasslands. Requests for this data can be made to aot@uchicago.edu. We expect the first official Array of Things data to be available this fall after they are installed and some testing is done.

The Chicago Architectural Foundation was thinking about using data from smartphones: were you thinking of partnering with them for data collection?

We have regular discussions with CAF and expect to collaborate with them. They displayed an early prototype of an Array of Things node in the City of Big Data exhibit and they intend to highlight the nodes on walking tours. In the current node design, there is no interaction with smart phones.

Please if and when the project closes how will PII be properly disposed of so that it will not later be leaked?

The project will not collect or save any PII. If the project is discontinued, any and all saved images will either be preserved for research under the same privacy and protection policies as the project, or will be deleted, or will be processed by software that removes faces, license plates, or other potentially identifying information.

I think information sharing should be limited carefully. No data should be downloaded to individual personal devices. This sounds a lot like big brother. If the data is there somebody will access and use it.

No data with any information about an individual will be published. All data management and access within the project team is governed by signed ethics and privacy agreements. These agreements include restrictions on where the data may be processed, including prohibition from storing on personal devices of any kind.

It is the following section which causes me the most concern: "The Array of Things technology is designed and operated to protect privacy. PII data, such as could be found in images or sounds, will not be made public. For the purposes of instrument calibration, testing, and software enhancement, images and audio files that may contain PII will be periodically processed to improve, develop, and enhance algorithms that could detect and report on conditions such as street flooding, car/bicycle traffic, storm conditions, or poor visibility. Raw calibration data that could contain PII will be stored in a secure facility for processing during the course of the Array of Things project, including for purposes of improving the technology to protect PII. Access to this limited volume of data is restricted to operator employees, contractors and approved scientific partners who need to process the data for instrument design and calibration purposes, and who are subject to strict contractual confidentiality obligations and will be subject to discipline and/or termination if they fail to meet these obligations." Of course the question becomes how does the public verify precisely who has such access to the PII data? Will access parameters be modified over time? Specifically, what assurances can one gain that the Chicago Police Department, NSA, or other agencies will not have access to this data?

The documents have been clarified to differentiate between “non-sensitive PII” such as can be found in the public domain, and “sensitive PII,” which can identify an individual. The Array of Things has no capability to access or detect sensitive PII, but can detect visual features that are considered to be “non-sensitive PII” such as faces in the public way or license plate numbers.

Although not sensitive PII, the privacy and governance policies nevertheless limit who will have access to data, under what circumstances, and for the limited purpose of research and development. The policies also outline how even this potential non-sensitive PII will be controlled, audited, and protected. One important role of the independent external team (Technical Security and Privacy Group, Section 3.4 of the governance policy document) is to audit the project with respect to compliance to these policies.

1) How do we submit an official request to participate in the project as a community organization? 2) Can you make a presentation in our community if we coordinate the locations, invites, etc...?) 3)FAiR has a group of experts that would like to speak to the project lead persons. How do we coordinate that? 4) Can you please send me the full contact list of the persons managing the project?

The project maintains contact forms at the http://www.arrayofthings.us website. The project can also be contacted via aot@uchicago.edu

In the wake of the pullback on current capabilities of the Array, one is still left with the concept of function creep. When new technology is introduced for a stated purpose, this purpose may not be the only purpose the technology is capable of. In other words, the capability profile of the apparatus in question is capable of a high degree of plasticity as viewed over time.

The capabilities have not been pulled back. In early discussions about the project during 2014 the project was considering the use of WiFi and Bluetooth information, such as is routinely gathered by retailers and private entities, to determine an approximation of the number of people at a location. The project elected not to implement this idea because, in addition to potential for misunderstanding, we determined that the technique would not yield valid scientific data.

The approval processes outlined in the published privacy and governance documents specify a review process that examines any new hardware or software feature that would have the potential to deviate from the privacy policy.

Hello there, I've been following AoT for the past two years. Happy to have the opportunity to share my thoughts. Thank you! 1. I have concern for how AoT envisions managing the tricky nature of feedback from the data, and how key variables and interactions will be chosen to formulate a picture of the urban system . . . could new variables chosen to model policy and decision making compromise privacy ? 2. We all know cities are a complex system that constantly evolves, so will AoT's foundational pillars of privacy do the same ? How could this public concern be quieted ? 3. How could AoT's blend numerical data and qualitative methods to more holistically craft future privacy policies ?

Responses based on numbered questions in the above:

1: The sensor capabilities have all been selected based on input from scientists and policymakers and other members of the public. As the data will be published as open and without charge, multiple groups can and will analyze the data, and the public nature of the data supports a dialog among different parties interpreting the data.

2: The governance and privacy policies, and the open source approach to the technology and the data, are designed to ensure that the privacy policies remain strong and effective.

3: A key objective to this project is to develop privacy policies that are effective while enabling new technologies and scientific approaches to be developed. This includes publication of the policies both to encourage suggestions and to assist similar projects in adopting stronger privacy policies and practices.