I originally posted this essay (now littered with 22 yr old link rot) in response to proposed post-911 government national security / surveillance measures. In the wake of the 2025 domestic terror attacks in New Orleans and Las Vegas and with the ever-opportunistic Donald Trump about to re-take the Presidency, I thought it possibly worthy of some reprise reflection. What follows will be some relevant cut & paste material from the original.
Under the guise of combating terrorism, our federal government proposes to assemble -- absent probable cause and/or search warrants -- comprehensive investigative data dossiers on ALL American citizens as well as foreigners in the U.S.
From William Safire's recent NY Times editorial (11/14/2002)"...Every purchase you make with a credit card, every magazine subscription you buy and medical prescription you fill, every Web site you visit and e-mail you send or receive, every academic grade you receive, every bank deposit you make, every trip you book and every event you attend -- all these transactions and communications will go into what the Defense Department describes as "a virtual, centralized grand database."
To this computerized dossier on your private life from commercial sources, add every piece of information that government has about you -- passport application, driver's license and bridge toll records, judicial and divorce records, complaints from nosy neighbors to the FBI, your lifetime paper trail plus the latest hidden camera surveillance -- and you have the supersnoop's dream: a "Total Information Awareness" about every U.S. citizen.
This is not some far-out Orwellian scenario. It is what will happen to your personal freedom in the next few weeks if John Poindexter gets the unprecedented power he seeks...."
[ 12/24/02 UPDATE: DARPA/OIA is apparently feeling the heat. The above graphic, which I copied from their website when I first assembled this page, has been toned down on the TIA website, with, among other changes, removal of the phrase "keeping track of individuals." They've also removed the Orwellian "scientia est potentia" logo ("knowledge is power") and bios of TIA principals like Poindexter. Interesting. ]
"...access, receive, and analyze law enforcement information, intelligence information, and other information from agencies of the Federal Government, State and local government agencies (including law enforcement agencies), and private sector entities (emphasis mine), and to integrate such information...""...To integrate relevant information, analyses, and vulnerability assessments (whether such information, analyses, or assessments are provided or produced by the Department or others) in order to identify priorities for protective and support measures by the Department, other agencies of the Federal Government, State and local government agencies and authorities, the private sector, and other entities..." (pages 23 and 24)
"...“This bill does not in any way authorize the Department of Defense program known as ‘Total Information Awareness,’ ” Armey said. “It does not authorize, fund or move into the department anything like it. In fact, this bill provides unique statutory protections that will ensure the Department of Homeland Security could never undertake such a program.” Armey also noted that “references in the bill to data-mining are intended solely to authorize the use of advanced techniques to sift through existing intelligence data, not to open a new method of intruding into lawful, everyday transactions of American citizens.Well, the relevant sections of the HSA do not make that clear, Mr. Armey, in fact they seem to contradict the assertion. The HSA speaks of integrating data from sources going well beyond "intelligence data" (see above, or better yet, read the Act. A link is provided below.) Moreover, we can be sure that DARPA/OIA will seek to be included in funding allocated under HSA for their little unconstitutional project. The devil will surely be in the details, and the operational details will consist of the endless HSA amendments, appropriations bills, and detailed CFRs (Code of Federal Regulations specs eventually issued for HSA). What is clear at this point is that any logical current reading of HSA tells us that the TIA program falls within the Homeland Security mandate. Confirmation of this last point is seen in remarks made by Under Secretary of Defense for Acquisition, Logistics, and Technology Edward C. "Pete" Aldridge during a November 20th DoD news briefing:
Q: How is this not domestic spying? I don't understand this. You have these vast databases that you're looking for patterns in. Ordinary Americans, who aren't of Middle East origin, are just typical, ordinary Americans, their transactions are going to be perused.Aldridge: Okay, first of all --
Q: And do you require search warrants? I mean, how does this work?
Aldridge: First of all, we are developing the technology of a system that could be used by the law enforcement officials, if they choose to do so. It is a technology that we're developing. We are not using this for this purpose. It is technology.
Once that technology is transported over to the law enforcement agency, they will use the same process they do today; they protect the individual's identity. We'll have to operate under the same legal conditions as we do today that protects individuals' privacy when this is operated by the law enforcement agency.
Q: So they would need a search warrant, then?
Aldridge: They would have to go through whatever legal proceedings they would go through today to protect the individuals' rights, yes.
Q: As part of this feasibility study, will anybody be looking at legislation, regulation, executive orders that may need to be modified?
Aldridge: I think that's probably an issue that's going to be taken up by the new office of homeland security, who probably will be very much involved in this type -- the use of this type of information.
The link between DARPA's "Total Information Awareness" proposal and the Homeland Security Department (in addition to regular U.S. civilian law enforcement) seems rather clear from those remarks. And, one must ask just how such agencies will "go through whatever legal proceedings they would go through today to protect the individuals' rights" after the TIA data horse is already out of the barn?
No one can question the worthiness of the fight against terrorism. However, the means as envisioned by OIA raise troubling Constitutional and operational questions. Aggregating private personal information for the (sole?) purpose of conducting widespread criminal investigations without probable cause and warrants seems to directly violate the 4th Amendment. Recall from the Bill of Rights: "The right of the people to be secure in their persons, houses, papers, and effects, against unreasonable searches and seizures, shall not be violated, and no warrants shall issue, but upon probable cause, supported by oath or affirmation, and particularly describing the place to be searched, and the persons or things to be seized." (Amendment IV) It is beyond any dispute, for example, that American authorities may not surreptitiously enter your houses without cause (validated by a warrant), rifle through your belongings, photocopy your papers, extract the data from your personal computers, intercept your emails, and remove this information for criminal investigatory scrutiny. How the proposed TIA program differs materially escapes me. Worse, this is a Defense Department entity proposing to undertake what would be unconstitutional for domestic civilian law enforcement.
Constitutional questions aside, we ought seriously question the likely operational utility of such an undertaking... hypothetical relative effectiveness scenarios of a TIA "terrorism detection database" under varied input assumptions -- its noxious constitutional implications aside. I have entered the following default values: [A] a population of 240,000,000 (~215,000,000 Americans 18+ yrs of age, plus ~25,000,000 foreigners), [B] 5,000 actual terrorists lurking among them, and [C] & [D] extremely generous (and highly unlikely) 99.9% "accuracy rates" pertaining to "true positives" (terrorists) and "true negatives" (innocent citizens). In such a scenario, counterintuitive though it may be (given a putative "99.9% accuracy rate"), the likelihood of identifying an actual terrorist is -- at best -- approximately 2% (the proportion of true positives in the test-positive subset of the initial population), and this small group will still have to be separated from the nearly quarter million "false positives" -- i.e., innocent people wrongly identified as terror suspects by a TIA model.
While the relative "accuracy" (sensitivity & specificity) levels of many clinical methods that estimate disease probabilities (or any type of experimental assay with anterior empirical underpinnings using Bayesian statistical methods [see below] ) are tolerably well-defined (and uniformly well below 99.9%), those pertaining to a TIA program are wholly speculative at this point, and will not clarify for years (if ever). One daunting limitation will come in the form of pervasively inaccurate and/or incomplete data pouring in from the myriad public and private sources. Another will owe to the relative recency and transience of the phenomenon. As Robert Levy of the Cato Institute observes: "Never mind that Pentagon computer scientists believe that terrorists could easily avoid detection, leaving bureaucrats with about 200 million dossiers on totally innocent Americans — instant access to e-mail, web surfing, and phone records, credit-card and banking transactions, prescription-drug purchases, travel data, and court records." (see www.nationalreview.com/comment/comment-levy112602.asp) I could not agree more. While the innocent will more or less simply go on with their customary daily life transactions, our terrorist enemies will undoubtedly take evasive measures. What shall we do? Outlaw, among other things, all anonymous cash transactions? If we don't (and we cannot) the very utility of a TIA database will be fatally compromised at the outset.
Given that no test is infallible, there are inescapable trade-offs in terms of relative false-positive/false negative levels associated with any assessment. For example, where routine workplace drug tests are concerned, labs seek to limit false positives (and the lawsuits they spawn), while they are far less troubled by false negatives (recreational drug users who slip through the screenings). With respect to terrorism, on the other hand, authorities will necessarily fret principally over false negatives -- actual terrorists who go undetected. Should you wrongly end up on a Homeland Security "No-Fly List" or be uselessly visited by a couple of FBI agents in the wake of a false positive TIA "hit", you will likely be met with bureaucratic indifference at best should you protest. At worst, you could be wrongly arrested, have your assets seized, lose your job, or otherwise have your reputation ruined.
What is(are) "Bayesian Statistics"?
Let p(t|+) = the probability of being a true positive ("t", e.g., for this discussion, a terrorist) given a positive TIA finding (+);
Let p(+|t) = the probability of testing positive (+) given that you are in fact a "t";
Let p(t) = the "prevalence" of true positives, e.g., the proportion of terrorists lurking in the aggregate population;
Let p(+|f) = the TIA probability of testing positive (+) given that you are in fact NOT a "t" (i.e., the false positive rate);
Let p(f) = 1 - p(t), the proportion of innocent ("non-terrorists") people in the population.
Look at the factor p(+|f)p(f) in the right-hand side of the denominator (lower right above). Given that the proportion of population true negatives (non-terrorists in this discussion) is indisputably extremely high, p(t|+), the likelihood of a TIA assessment yielding true positives will necessarily be intractably low, given the relative magnitude of p(+|f)p(f) (unless we have perfect concurrent 100% Sensitivity and Specificity, which, in the real world, will not be the case). Moreover, p(+|f) is not wholly independent of p(f), in that the more true negatives in the population, the more chances you have to err. Similarly, the fewer true positives in the population, the fewer chances you have to get it right.
This is why we don't test everybody for every disease. This is why we don't test every square foot of the nation in search of pollution.
This is why we have Probable Cause and Warrants codified in the Constitution -- principles apparently lost on the likes of a John Poindexter or a John Ashcroft. In Bayesian terms, "probable cause" serves to ensure that the "prevalence" of guilty individuals in a criminal proceeding is minimally greater than 50%, making us much less likely to wrongly convict someone of a crime (or brand someone as a "terror suspect" behind his or her back after fishing through personal data without constitutional -- rational -- justification).
- What will be the consequences of being wrongly identified (a false positive)? If, for example, you are a false positive for banned objects (e.g., weapons) at the x-ray equipment and/or metal detector at the airport, the error is quickly confirmed and you are on your way. Your identity is not recorded and added to a database. Anyone who has ever falsely tested positive for illicit drug use or has been wrongly arrested, however, can give you a bit of insight into the persistent ugliness that can in fact follow errors by those in authority (See "FBI’s post-9/11 watch list spreads far, mutates" below).
What if a TIA model only achieves modest (though still technically "significant") "accuracy" and "precision" levels? For example, simply decrement the Sensitivity and Specificity levels in my scenario to 99.0? You then have nearly 2.4 million false positives to weed out. At 95% "accuracy" you would have 12,000,000 people to subsequently surveil and/or interrogate. It quickly becomes logistically untenable. Again, you're stuck with the low-prevalence problem, which trumps any level of "true positive" accuracy.
FBI’s post-9 /11 watch list spreads far, mutatesBY ANN DAVIS, THE WALL STREET JOURNALLAS VEGAS — When a patron at the New York-New York casino plugged his frequent player card into a slot machine one day this summer, something strange happened: An alert warned the casino’s surveillance officials that an associate of a suspected terrorist might be on the grounds.How did a casino’s computer make such a connection? Shortly after Sept. 11, 2001, the FBI had entrusted a quickly developed watch list to scores of corporations around the country.Departing from its usual practice of closely guarding such lists, the FBI circulated the names of hundreds of people it wanted to question. Counterterrorism officials gave the list to car-rental companies. Then FBI field agents and other officials circulated it to big banks, travel-reservation systems, firms that collect consumer data, as well as casino operators like MGM Mirage, the owner of New York-New York. Other recipients included businesses thought vulnerable to terrorist intrusion, including truckers, chemical companies and power-plant operators. It was the largest intelligence-sharing experiment the bureau has ever undertaken with the private sector.A year later, the list has taken on a life of its own, multiplying — and error-filled — versions being passed around like bootleg music. Some companies fed a version of the list into their own databases and now use it to screen job applicants and customers. A water-utilities trade association used the list "in lieu of" standard background checks, says the New Jersey group’s executive director.The list included many people the FBI didn’t suspect but just wanted to talk to. Yet a version on SeguRed. com, a South American security-oriented Web site that got a copy from a Venezuelan bank’s security officer, is headed: "list of suspected terrorists sent by the FBI to financial institutions."Meanwhile, a supermarket trade group used a version of the list to try to check whether terrorists were raising funds through known shoplifting rings. The trade group won’t disclose results.The FBI credits the effort, dubbed Project Lookout, with helping it rapidly find some people with relevant information in the crisis atmosphere right after the terror attacks. MGM Mirage says it has tipped off the FBI at least six times since beginning to track hotel and casino guests against the list.The FBI and other investigative agencies — which were criticized after Sept. 11, 2001, for not sharing their information enough — are exploring new ways to do so, including mining corporate data to find suspects or spot suspicious activity. The Pentagon is developing technology it can use to sweep up personal data from commercial transactions around the world. "Information sharing" has become a buzzword.But one significant step in this direction, Project Lookout, is in many ways a study in how not to share intelligence.The watch list shared with companies — one part of the FBI’s massive counterterrorism database — quickly became obsolete as the bureau worked its way through the names. The FBI’s counterterrorism division quietly stopped updating the list more than a year ago. But it never informed most of the companies that had received a copy. FBI headquarters doesn’t know who is still using the list because officials never kept track of who got it. "We have now lost control of that list," says Art Cummings, head of the strategic analysis and warning section of the FBI’s counterterrorism division. "We shouldn’t have had those problems."The bureau tried to cut off distribution after less than six weeks, partly from worry that suspects could too easily find out they had been tagged. Another concern has been misidentification, especially as multipart Middle Eastern names are degraded by typos when faxed and are fed into new databases.Then there’s the problem of getting off the list. At first the FBI frequently removed names of people it had cleared. But issuing updated lists, which the FBI once did as often as four times a day, didn’t fix the older ones already in circulation. Three brothers in Texas named Atta — long since exonerated, and no relation to the suspected lead hijacker — are still trying to chase their names off copies of the list posted on Internet sites in at least five countries.People who’ve asked the FBI for help getting off the bootleg lists say they’ve been told the bureau can’t do anything to correct outdated lists still floating around. The FBI’s Cummings says that "the most we can control is our official dissemination of that list." Once it left the lawenforcement community, "we have no jurisdiction to say, ‘If you disseminate this further, we will prosecute you. ’"CIVIL LIBERTARIANS WORRYDespite the problems, Cummings and other proponents of information-sharing say the process should be improved, not abandoned. Software companies are rushing to help, trying to make information-sharing easier and more effective.Systems Research & Development in Las Vegas is among those working on ways to make exchanging law-enforcement and corporate information a two-way street without compromising privacy. "I believe there’s probably 10 to 50 companies in America that across them touch 80 percent to 90 percent of the entire country," says Jeff Jonas, Systems Research & Development founder, citing credit-card companies, banks, airlines, hotel chains and rental-car companies. "There should be a protocol in place that corporate America could be plugged into that allows them to say, ‘ We’d like to help, ’" he says.But some officials at the U.S. Customs Service, the Office of Homeland Security and the FBI’s own Criminal Justice Information Services Division doubt the wisdom of circulating watch lists widely, and some say they didn’t even know about Project Lookout.Civil libertarians worry about enlisting companies to track innocent people for the government. Many companies say they need to be insulated from liability if they’re expected to share data on people with the government. "It’s a tough, tough box to get into. You end up with legitimate concerns about moving into Orwell’s 1984," says Henry Nocella, an official of Professional Security Bureau Ltd. in Nutley, N. J., and a former security director at Bestfoods. "Yet you know there’s a need to collect and analyze information."‘NOT PLAYING GAMES’Before Sept. 11, 2001, the government rarely revealed the names of terrorism suspects to companies. The exception was when it had a subpoena for specific information the government believed a company had about a person under investigation. But after the attacks, counterterrorism officials were concerned that members of terrorist cells could have slipped undetected into companies or communities. They feared that by the time they figured out where to direct subpoenas, the suspects could get away or even stage another attack. Holed up in a "strategic information and operations center" in Washington, a small circle of FBI officials decided on Sept. 15, 2001, to put out a broad heads-up to state and local police and to trusted companies. "We’re not playing games here. This was real life. We wanted as many people as possible to know this is who we wanted to talk to," says Steven Berry, an FBI spokesman.Agents cast a wide net that, by its nature, included scores of innocent people.They started by using record searches and interviews to identify "anybody who had contact" with the 19 hijackers, Cummings recalls.Kevin Giblin, chief of the terrorist warning unit, decided that car-rental companies and local police should be the first outside of the airlines to get the list. One firm that received it, Ford Motor Co. ’s Hertz unit, says it checked the list against its records and told the FBI of any matches, but then basically let the list lie dormant. Trade groups proved a quick way to spread the word. The FBI gave the list to the Transportation Department. It shared the names with the American Trucking Associations, which promptly e-mailed the list to nearly 3,000 trucking companies. The International Security Management Association, an elite group of executives at 350 companies, put the list on a password-protected part of its Web site, allowing members to scan it in private, members say.‘ WASN’T A BLACKLIST ’On their own, FBI field agents shared the list with some chemical, drug, security-guard, gambling and power-plant companies, according to interviews with companies. The FBI’s Giblin says he hadn’t realized how extensively field agents distributed the list. But he says agents have considerable autonomy and are expected to keep close ties to companies in their area. Giblin says the bureau stressed to recipients that the people named weren’t all suspects. "This wasn’t a blacklist," he says. By the time the FBI tried to close out its list, at least 50 versions were floating around, say people who saw numbered ones. Some companies were asking software firms such as Systems Research & Development how to make better use of the lists. The company, which is financed in part by a venture capital arm of the Central Intelligence Agency, has a program called NORA, for Non-Obvious Relationship Awareness. It mines data to detect hard-to-see links between people, such as use of the same residence or phone number.Giblin says when he fields tips nowadays from companies that have the watch list, he tells them it’s obsolete. But not all field offices turn down such tips.If the government does decide to disseminate watch lists in the future, it won’t face high legal hurdles, says Daniel Ortiz, a law professor at the University of Virginia. He says someone who appears wrongly on a watch list could ask for a correction but couldn’t prevent the list’s circulation or sue the government for damages under current privacy laws. The government just has to be careful not to single people out solely on race or ethnicity.Businesses face more jeopardy, however. Many industries, such as cable companies and banks, operate under special privacy laws preventing them from giving customer information to the government without a subpoena.
___
About the authorI have been working with analytical data for the past 16 years, in four disparate domains: [1] forensic-level environmental radiation and mixed waste analysis, [2] industrial “Predictive Maintenance” (PDM) diagnostics, [3] Nevada Medicare hospitalization outcomes investigations, and, (for the past nearly three year to date) [4] credit risk management in a subprime demographic (people who perhaps shouldn’t be even be granted credit). My training and experience with both the theory and practicalities of data logistics and assessment are at once broad and deep.My tenure in radioassay was one in which you frequently had to justify every for-the-record digit to the satisfaction of a seemingly endless horde of auditors (many of whom served the potentially legally liable parties eager to discredit your work). Put down “2.7 pCi/kg.” on a report and you could expect to be called upon to demonstrate that your records scientifically verified your bench-level operational ability to distinguish between “2.6” and “2.8”. “Significant figures” rounding was a routine contractual stipulation, one subject to ongoing verification.During my PDM tenure, it quickly became obvious that, were one of our digital FFT monitor-analyzers to prove inaccurate and permit, say, a power plant turbine bearing or shaft to fail without warning, huge sums might be lost, and people might die (and we might be subsequently sued out of existence). Our engineers and programmers, consequently, personified the term “fastidious.”Next: Nevada Medicare, and a rude empirical awakening. The U.S. Health Care Financing Administration (HCFA) quietly internally acknowledged that the hospitalization data we had to work with at the Nevada Peer Review was perhaps only “~80% accurate.” Medical charts were shot through with inaccuracies and omissions owing to realities such as the vagaries of administrative ICD-9 and DRG coding and the chronic inscrutability of clerical and/or physicians’ penmanship. A staple of designing Peer Review statistical evaluation projects was compensatory “20-25% oversample” for chart abstraction and review.Now I work in revolving credit risk assessment (a privately-held issuer of VISA and MasterCard accounts), where our department has the endless and difficult task of trying to statistically separate the “goods” from the “bads” using data mining technology and modeling methods such as factor analysis, cluster analysis, general linear and logistic regression, CART analysis (Classification and Regression Tree) and related techniques.Curiously, our youngest cardholder is 3.7 years of age (notwithstanding that the minimum contractual age is 18), the oldest 147. We have customers ostensibly earning $100,000 per month—odd, given that the median monthly (unverified self-reported) income is approximately $1,700 in our active portfolio.Yeah. Mistakes. We spend a ton of time trying to clean up such exasperating and seemingly intractable errors. Beyond that, for example, we undertake a new in-house credit score modeling study and immediately find that roughly 4% of the account IDs we send to the credit bureau cannot be merged with their data (via Social Security numbers or name/address/phone links).I guess we’re supposed to be comfortable with the remaining data because they matched up -- and for the most part look plausible. Notwithstanding that nearly everyone has their pet stories about credit bureau errors that gave them heartburn or worse.12/26/02 UPDATE: see www.consumerfed.org/121702_creditscorereport.html for the latest on the persistent extent, and the actual and potential negative impacts of credit bureau inaccuracies.
In addition to credit risk modeling, an ongoing portion of my work involves cardholder transaction analysis and fraud detection. Here again the data quality problems are legion, often going beyond the usual keystroke data processing errors that plague all businesses. Individual point-of-sale events are sometimes posted multiple times, given the holes in the various external and internal data processing systems that fail to block exact dupes. Additionally, all customer purchase and cash advance transactions are tagged by the merchant processing vendor with a 4-digit “SIC code” (Standard Industrial Classification) categorizing the type of sale. These are routinely and persistently miscoded, often laughably. A car rental event might come back to us with a SIC code for “3532- Mining Machinery and Equipment”; booze purchases at state-run liquor stores are sometimes tagged “9311- Taxation and Monetary Policy”; a mundane convenience store purchase in the U.K. is seen as “9711- National Security”, and so forth.Interestingly, we recently underwent training regarding our responsibilities pursuant to the Treasury Department’s FinCEN (Financial Crimes Enforcement Network) SAR program (Suspicious Activity Reports). The trainer made repeated soothing references to our blanket indemnification under this system, noting approvingly that we are not even required to substantiate a “good faith effort” in filing a SAR. In other words, we could file egregiously incorrect information that could cause an innocent customer a lot of grief, and we can’t be sued.He accepted uncritically that this was a necessary and good idea.You just watch. The Homeland Security Act and its eventual amendments and CFRs, along with those pertaining to TIA will also certainly contain such blanket liability immunity provisions.We know why.Robert E. Gladd, MA/EPS, CQELas Vegas, NV