Our first-ever Intern Night Live

July 17, 2012, 4:38 pm

≫ Next: Shyam Sankar speaks at TEDGlobal 2012

Just a week after the last group of summer interns arrived, we hosted their official welcome to the company at Intern Night Live. The event featured founder Stephen Cohen, who shared how his own experience as an intern at Clarium Capital led to the founding of Palantir.

The event was like Palantir Night Live —complete with signature mini-sliders and a speaker—but with a few twists. Our very own Internal Tools Developer Shane Knapp set the stage, spinning a DJ set as interns completed a trivia challenge and chatted with current Palantirians.

Ari Gesher, Senior Software Engineer and Palantir veteran, outlined the history of the company for the interns, touching upon both the finer points of the development of Palantir’s software and the history of the offices we call “The Shire.” Did you know one of our buildings was the birthplace, of the programming language we now know as Java?

The night came to a close with a raffle and the interns left their mark by signing one of the Palantir walls.

Check out all the Intern Night Live Pictures

↧

Shyam Sankar speaks at TEDGlobal 2012

July 18, 2012, 4:46 pm

≫ Next: Dinner with Peter Thiel and the Exploratorium: an Intern’s Saturday Night

≪ Previous: Our first-ever Intern Night Live

TED invited Shyam Sankar, Director of Forward Deployed Engineering, to speak at TEDGlobal 2012 in Edinburgh, Scotland. Shyam used the opportunity to discuss Human-Computer Symbiosis: the idea that technology should be designed in a way that amplifies human intelligence instead of attempting to replace it. He explained the concept, which is core to the development of Palantir’s software, by using the canonical example of chess. He told the stories behind two classic encounters between man and machine: the 1997 match in which IBM’s Deep Blue supercomputer defeated chess grandmaster Garry Kasparov, and a 2005 freestyle tournament in which two amateur players using three weak laptops defeated all comers, including grandmasters armed with supercomputers. (For more on these examples, see earlier posts here, and here.)

Shyam went on to expand on the relevance and impact of Human-Computer Symbiosis today with the emergence of Big Data and related technologies. Responding quickly to victims of the Haiti earthquake, making sense of complex documents found in an Al-Qaeda house, designing the 9/11 Memorial—these are all tasks that are best tackled by a nimble mind and powerful technology working in concert.

Shyam’s visual presentation was praised as one of the best at the conference. So much so that it gained its own spot on the TED blog. Designed in-house by Collin Roe-Raymond and the Design team, the keynote was an example in its own right of human expertise and technology coming together in a beautifully stunning way.

Check out the TED blog for a full recounting of Shyam’s talk.

↧

Dinner with Peter Thiel and the Exploratorium: an Intern’s Saturday Night

July 27, 2012, 5:11 pm

≫ Next: Journalist uses Palantir to investigate illicit Human Tissue Trafficking

≪ Previous: Shyam Sankar speaks at TEDGlobal 2012

Peter Thiel. Twenty-foot tornadoes to touch. Bite-sized PB&J macaroons. No, you’re not dreaming. You’re at Palantir Intern Night 2012.

On July 14, interns from across the Bay Area experienced a one-of-a-kind evening at the Exploratorium in San Francisco. Our intern class, along with friends from around the Valley, enjoyed dinner and an address from Thiel, who shared Palantir’s founding story and answered questions from the interns. One intern asked why Thiel chose to name the company “Palantir,” given the sometimes negative associations of the seeing stones in The Lord of the Rings.

Thiel paused for a moment before launching into his response:

“Well, in the First Age of Middle Earth, Palantirs were indisputably good. I would argue that in the Second Age of Middle Earth [….] they were as well.”

After regaling the audience with a brief history of Middle Earth, he closed his explanation with some philosophy:

In the Third Age under Sauron, they were used for evil, and really that just reminds us that there’s great responsibility that comes with power and that anything can become corrupted if we’re not careful.

This context provides a telling link between Palantir’s name and the company’s commitment to protecting Privacy and Civil Liberties—a commitment that has been central to Palantir since its earliest Age.

After his talk, Thiel hosted the interns at his house for chess and an impromptu rooftop dance party, truly making the night a nerd’s dream come true.

Check out all the Intern Night 2012 Pictures

↧

Journalist uses Palantir to investigate illicit Human Tissue Trafficking

August 1, 2012, 5:03 pm

≫ Next: Giving back in our own way: the Philanthropy Team

≪ Previous: Dinner with Peter Thiel and the Exploratorium: an Intern’s Saturday Night

Palantir donated software to the International Consortium of Investigative Journalists (ICIJ) in support of a four-part investigative series on the global trade and illegal trafficking of human tissue. ICIJ broke the story at the Google Ideas Summit on Illicit Networks on July 17, 2012. The series investigates how human tissue is taken from the dead, how the tissue moves through illicit networks, and how the use of this tissue in medical procedures is affecting patient safety. In response to the stories, the World Health Organization is planning to create a coding system to better the trade of human tissue.

During the Summit, Palantir hosted a lab that showed attendees how the journalists used Palantir Gotham to help with their investigative reporting.

ICIJ investigative stories have global impact. Working with organizations like ICIJ is central to Palantir’s mission of making a difference by addressing today’s most critical challenges and empowering people to interact with their data in ways never before possible.

View the live presentation:

Additional Coverage:

NPR – Jul 17-19, 2012

Huffington Post – Jul 18, 2012

ICIJ – Jul 17, 2012

MSNBC – Jul 17, 2012

Sydney Morning Herald – Jul 16, 2012

Find out more about what we do, and how you can get involved.

↧

Giving back in our own way: the Philanthropy Team

August 3, 2012, 2:21 pm

≫ Next: Guava Collections Subtleties

≪ Previous: Journalist uses Palantir to investigate illicit Human Tissue Trafficking

Since our earliest days, the people that make up Palantir have always been passionate about two things: building the best technology to change the way that people relate to data and deploying that technology to the organizations with the data and mission to make a real difference.

As we built our business, we started looking for ways to give back and hit on an interesting strategy: we’d look for non-profits that had data about important problems and offer them our software and expertise. Our CEO, Dr. Karp, pledged that we should seek to donate between 10 and 20 percent of our revenue as donations-in-kind to world-changing organizations.

As our philanthropic mission has continued, we have decided to create a full-time Philanthropy team (which is currently hiring in our Palo Alto and McLean offices).

Read on to see a few examples of our past work.

A Few Examples

National Center for Missing
and Exploited Children (NCMEC)

Initially staffed with volunteers from the engineering and business teams, one of our first philanthropic partners was the National Center for Missing and Exploited Children (NCMEC). NCMEC is a private, nonprofit organization that serves as the United States’ resource on the issues of missing and sexually exploited children. The Center provides information and resources to law enforcement and other professionals, parents and children, including child victims.

In their mission to aid local law enforcement in quickly solving child abduction cases, NCMEC uses the Palantir Gotham platform to shorten their response time during the critical hours following the disappearance of a child. At our GovCon 7 conference, NCMEC spoke about how Palantir’s technology has been transformative to this very important mission.

Supporting Analysis of the Sinjar Records

Supporting non-profits that analyze conflict is a key element of Palantir’s philanthropic strategy. Whether a human rights organization or think tank, we believe that open source analysis of violent conflict is critical for public awareness and understanding. One of the earliest examples was our 2008 support for the Combating Terrorism Center at West Point, which at the time was analyzing a fascinating dataset of personnel records collected by al-Qaeda’s organization in Iraq. The academics at West Point had already analyzed the data statistically, but with Palantir support they were able to uncover hidden connections in the data, most importantly by identifying the most important smugglers in Syria that were ushering foreign fighters into Iraq. The West Point academics recognized that a strategy designed to prevent al-Qaeda fighters from entering Iraq could be focused on specific nodes, and by using Palantir to understand the payments made to those smugglers were able to suggest different strategies for disrupting various smuggling networks.

The Center for Public Integrity

Palantir has worked with The Center for Public Integrity (CPI) on a couple of projects. The first looked at predatory loans in the subprime mortgage market during the last housing bubble – our Horizon technology (part of Palantir Gotham) made possible the interactive filtering through 350 million mortgages to find bad actors. The second, in conjunction with The International Consortium of Investigative Journalists (ICIJ), looked into the true timeline of the tragic Daniel Pearl kidnapping and murder, which differed significantly from the official version of the story.

We look forward to telling our stories here, in this blog. Stay tuned, we’re just getting started.

↧

Guava Collections Subtleties

August 16, 2012, 11:40 am

≫ Next: Securely collaborating across the enterprise and with external partners to expose cyber fraud

≪ Previous: Giving back in our own way: the Philanthropy Team

At Palantir, we write most of our code in Java, but we miss functional language features like map and filter for working with collections of objects. To make up for that, we use Google’s Guava libraries, but occasionally they don’t behave quite how we expect.

Read on to see one of our Guava puzzlers.

The Puzzle

Can you figure out why the following test doesn’t pass?

public void testTreeBuild() {
    // Create root.
    Node root = new Node(0L);
    // Create a list of ids.size() children.
    Collection<Long> ids = Lists.newArrayList(1L);
    Collection<Node> children =
    Collections2.transform(ids, new Function<Long, Node>(){
    public Node apply(Long id) {
        Node n = new Node(id);
        return n;
    }});
    // Give every child a grandchild.
    for (Node child : children) {
        Node grandchild = new Node(2L);
        child.addChild(grandchild);
    }
    // Attach every child to root.
    for (Node child : children) {
        root.addChild(child);
    }

    // Validate that root has children.
    assertEquals(1, root.numChildren());
    // Validate that children have children.
    for (Node child : root.children) {
        assertEquals(1, child.numChildren());
    }
}
private static class Node {
    public long id;
    public List<Node> children = Lists.newArrayList();
    public Node(long id) {
        this.id = id;
    }
    public void addChild(Node child) {
        this.children.add(child);
    }
    public int numChildren() {
        return children.size();
    }
}

Click here to see the answer

Solution

Many of Guava’s library functions, including Collections2.transform(), provide a live view of the underlying collection. Most of the time this is a feature – when you update the underlying collection, you get an updated derived collection. The downside is that every time you perform an operation on the derived collection, it performs the operation on the underlying collection and then executes the function on that result. In this case, when we iterate over the children collection to give every child a grandchild, we invoke the node creation function once per id and create one new node. Later, when we attach each child to the root, we create new nodes again for each id, so the nodes that have grandchildren are not the same as the nodes that are attached to the root. This means that the last assertion of this test fails.

Best practices for derived collections that at compute views on underlying collections:

Don’t use functions that have side effects
Don’t use functions that are slow
Don’t create objects in functions
If you want to use a function that doesn’t obey the above rules, immediately create a new collection so that the function is invoked exactly once per object, for example: Lists.newArrayList(Collections2.transform(ids, function))

↧

Securely collaborating across the enterprise and with external partners to expose cyber fraud

August 20, 2012, 11:54 am

≫ Next: A Study in Cell Phones

≪ Previous: Guava Collections Subtleties

In an earlier demonstration on this blog, we showed how a single analyst used Palantir Metropolis to uncover an actual cyber threat at one of Palantir’s largest commercial deployments. However, in many large financial institutions, detecting complicated schemes requires the work of multiple analysts across the enterprise. Collaboration is critical, but the need to enforce data access restrictions can impede cooperative analysis across groups. In response to this need, Palantir has made secure information sharing a possibility within the organization and with external community members. Watch as we demonstrate how multiple analysts at one of the world’s largest financial institutions can collaborate to expose cyber fraud.*

*While this demonstration is based on a real investigation workflow, the data has been anonymized, and any resemblance to real people or entities is coincidental.

↧

A Study in Cell Phones

August 29, 2012, 10:55 am

≫ Next: Adaptive Management and the Analysis of California’s Water Resources

≪ Previous: Securely collaborating across the enterprise and with external partners to expose cyber fraud

A recent story in the New York Times, “More Demands on Cell Carriers in Surveillance,” describes the response by cellular service providers to an inquiry from Rep. Edward J. Markey, in which the service providers reported on the frequency of requests for subscriber information by law enforcement agencies. In 2011, carriers responded to more than 1.3 million requests for subscriber information (anything from billing address to geolocational information based on GPS and cell tower hits).

Law enforcement requests for information from telecommunications providers are certainly not new and are frequently an important tool in the effort to track and apprehend criminals. Mobile communications information is particularly valuable given that, as one detective quoted in the article points out, “At every crime scene, there’s some type of mobile device.” The ability to gather more information related to these mobile devices may yield more clues to help solve a crime. On the flip side, more of this kind of information (combined with more analytic power) may also increase the potential for privacy and civil liberties violations. Individuals may be improperly scrutinized or more detailed inferences that are irrelevant to the investigation may be made about an individual’s private life.

Beyond this larger question, however, the article offers us great examples of several of the privacy and civil liberties issues that are common throughout the data analytics world.

“Sprint and other carriers called on Congress to set clearer legal standards for turning over location data, particularly to resolve contradictions in the law.”

As noted in an earlier post on this blog, the law often lags significantly behind the pace of technological development. As new means of communication develop and new information can be generated from those means (e.g., geolocational data), law enforcement is forced to try to figure out just how a law written in 1986 should apply to 2012 technology. The end result is usually confusion, with both law enforcement and the cellular carriers unsure of how to comply with the law. Explicit guidance is likely to come months or years later in the form of a court decision—if at all.

“When a police agency asks for a cell tower ‘dump’ for data on subscribers who were near a tower during a certain period of time, it may get back hundreds or even thousands of names.”

The imprecise handling of data often results in too much information being produced in response to a request (see, e.g., the overproduction of information in response to National Security Letters). This is an issue that can often largely be addressed through technical means. Just because a high volume of data exists does not mean access to it cannot be controlled with precision. Complex searches can be conducted to return only the information directly relevant to a particular investigation. Personally identifiable information can be protected so that analysis can be conducted without revealing identifying information until a reasonable evidentiary threshold has been crossed. In short, it should now be possible to sift through data and pick out relevant threads of information without ever exposing the rest of the data set to human eyes. Ideally, this granular sifting of information would occur before the data is passed to law enforcement. However, accompanied by a credible and transparent review process and oversight regime, this data control could allow for post-sharing filtering that could reassure the public that information is being protected to the greatest extent possible.

“Because of incomplete record-keeping, the total number of law enforcement requests last year was almost certainly much higher than the 1.3 million the carriers reported….”

Information on how data is used is often lacking, but modern information systems increasingly generate “data about data” that should make it easier for data stewards to provide information about what they store, use, and share. Generating and analyzing this data can provide critical information that can not only be used to better protect privacy and civil liberties but can also lead to better, more efficient analysis. What information is being shared with whom? How often does a certain type of data get used? How often does data of a certain age get used? How successful has a particular type of analysis been? Answering these questions can lead to better data handling policy as well as help redirect analytic resources along more effective lines.

“Chris Calabrese, a lawyer for the A.C.L.U., said he was concerned… about the agencies then keeping those records indefinitely in internal databases.”

When is information no longer worth keeping? How valuable is a ten year-old piece of data when there is a possibility that more sophisticated analysis might—just might—unearth the first link in the chain of evidence that could solve a serious crime or prevent a terrorist attack? Alternatively, how much is an individual damaged when personal but non-criminal information about him or her is held in a government database, potentially leading to significant stigmatization if the individual’s presence in that database is revealed? What looks like a simple cost-benefit analysis is made more complicated by uncertainty over the value of the information being retained. The kinds of metadata and metrics discussed above might go a long way toward providing meaningful quantitative information about how often information of a certain age from certain sources is actually used, thus contributing to an informed, reasonable retention policy.

Volumes can be and have been written on each of these issues, and we will frequently return to them in the pages of this blog, including some looks at how Palantir can address these issues in a way that mitigates concerns about privacy and civil liberties. We believe that none of the issues described here is insurmountable. These challenges can be addressed through close engagement between policymakers and technologists, thus allowing law and policy to be informed by technical feasibility and new technology to be developed from an early stage with policy needs in mind.

↧

Adaptive Management and the Analysis of California’s Water Resources

September 6, 2012, 6:08 pm

≫ Next: Product Support at Palantir: Let Me Help You With That

≪ Previous: A Study in Cell Phones

Water resource management in California is a precarious and costly balancing act. Various federal, state, and municipal organizations have a stake in the management of California’s water resources. In the case of the Sacramento River Delta, they all compete to manage a single resource. Decisions made about the Delta affect millions of Californians, as well as the endangered species in the Delta’s delicate estuarial ecosystem, such as the Delta smelt. It is therefore critical that these decisions be based on transparent, reproducible, and comparable analyses of the best available data.

In this demonstration developed by Palantir and environmental consultants from NewFields, we show how Palantir’s data fusion platforms can help tackle different facets of the adaptive resource management problem. We use the Palantir Gotham platform to map out the relationships of the various organizations managing the Delta, as well as the documents they publish and the data sources they maintain. With Palantir Metropolis, we use data from monitoring stations that are scattered throughout the Delta to analyze relationships between smelt abundance, salinity (and an associated metric called X2), and other physical factors in the Delta such as temperature and turbidity (cloudiness of the water). The Palantir Metropolis platform offers a means to compare scientific analyses at the high level of granularity needed to make critical management decisions. Users can conduct and modify competing analyses side by side to easily see where different models or underlying data diverge and lead to different conclusions.

The Chart application in Palantir Metropolis allows users to share their analysis and conclusions, quickly and easily. In this case, an analyst displays the changes in water salinity over time.

This kind of analysis can give policy makers maximum insight into the relationships between the variables that affect the Delta’s health and allow them to make decisions that appropriately weigh the interests of all parties involved.

↧

Product Support at Palantir: Let Me Help You With That

September 12, 2012, 10:17 am

≫ Next: How to Rock a Developer Phone Interview

≪ Previous: Adaptive Management and the Analysis of California’s Water Resources

(Comic courtesy of XKCD, via Creative Commons License)

When a user has to contact Product Support for help, he or she already has a problem. Dealing with Support should never add aggravation on top of that. Slogging through scripted steps that have nothing to do with the real problem or wading through tiers of call center workers to find someone who actually understands the situation is a frustrating waste of the user’s time, and it isn’t any fun for the Support staff either.

At Palantir, we work hard to make sure Product Support is a solution, not a chore. Our users have already said “shibboleet” just by contacting us. We don’t have tiers and we don’t have scripts. Every member of the team is a full-fledged engineer and product expert. It’s our mission to use our expertise to help our users succeed.

Palantir’s data fusion platforms are powerful and have applications across a broad range of use cases. They are used in the intelligence, defense, and law enforcement communities as well as in financial institutions and health organizations. Our software is applied in each of these domains to solve complex and important problems, so if a customer hits a snag it’s likely to be complex too. These aren’t the sort of issues that can be fixed by following a script—they require genuine problem-solving expertise. So what do we do when confronted with a truly perplexing problem to solve? This post explains how we handle these situations at Palantir.

How to Solve Any Problem Like a Palantir Product Support Engineer
Our approach is a four-step process:

Gather Information
Isolate the Cause
Provide a Solution
Record for Posterity

You will become a detective, a scientist, an inventor, and an archivist in turn. I’ll illustrate with a hypothetical problem in which a user is trying to import a spreadsheet:

Step One: Gather Information
During the first step, you are a detective, interviewing witnesses and collecting evidence with the goal of building a complete and accurate picture of what happened. And as a detective, you are interested in just the facts. Don’t be led astray by speculation or make unfounded assumptions. Testimony is valuable, but evidence is far superior. Ask the user what happened, but also check the logs. It’s not that users are malicious or dishonest—far from it. They’ve come to you for help, so they are motivated to find a solution. But they don’t have your deep understanding of the product. It’s easy for them to have inaccurate mental models of the software, or use the wrong terminology in describing their experience. Trust your users, but verify their claims.
Suppose you receive an email from a user that says the following:

“Dear Product Support: I’m trying to import a spreadsheet, and it isn’t working. I’m just getting an error message that says the file can’t be imported. Can you help me?”

The first thing you need to do is understand what you do and do not know, keeping in mind that the latter category will usually be much larger than the former. Here’s what you do know in this case:

The user is failing to import a spreadsheet.
The user is receiving an error message.

And here’s a sampling of relevant information that you don’t know:

The exact contents of the error message.
Whether any errors are being written to log files.
Whether the user can import other files.
Whether the spreadsheet file is malformed or corrupt.
How large the spreadsheet is.
How the spreadsheet’s contents are structured.
Whether the user has previously been able to import spreadsheets like this one.
How the user is attempting to import the spreadsheet.
What version of your software the user is running.
Whether the user’s administrators have made any policy changes that would prevent file imports.
Whether there have been any recent changes in the user’s environment.

Once you’ve identified what you don’t know, it’s time to figure out where you can get that information. There are a lot of potential sources of knowledge. Here are some to start with:

You can ask the user reporting the issue questions like, “What is the exact text of the error message you are seeing?”
You can check the product documentation for answers to questions like, “What are the methods for importing a spreadsheet, and how are they expected to behave?”
Your bugtracker can yield answers to questions like, “What problems have caused spreadsheet imports to fail in the past?”
It’s often useful to check with your fellow engineers by asking questions like, “Hey, anybody have experience with failed spreadsheet imports?”
Developers can answer questions like, “What assumptions does our import code make about spreadsheets?”
Domain experts can be helpful for questions like, “What tricky things can be done with spreadsheets that prevent them from being recognized correctly in other programs?”
Finally, some information can be found with some quick internet research, such as “What popular formats exist for spreadsheets?”

It is vital for your communications to be clear and specific during this process. Vague questions will receive vague answers. You should also require clarity and specificity from the user. If there is any ambiguity in his or her answers, ask for more information—don’t assume you know what he or she means. It’s easy to make incorrect assumptions that lead you attempt to solve a problem that doesn’t even exist. Pay extra attention to anything that seems unusual or out of place. These are often good jumping off points for further investigation, as they frequently indicate a hidden assumption that needs to be overturned to reveal the truth.

Step Two: Isolate the Cause
You’ve gathered all the information that is known about the problem. Now you must be a scientist, formulating theories and designing experiments to test them, with the goal of understanding exactly what part of the user’s process is breaking down and why. And as a scientist, you must not become too attached to any particular theory. If an approach isn’t panning out, abandon it and try another one. It’s important that you not blind yourself to the actual cause of the problem by fixating on something that turns out to be irrelevant.
The first thing you need to do in this step is reduce the problem space by eliminating potential sources of the problem until you drill down on the exact cause. For every possible cause, find a test to perform which will have a different outcome depending on whether that cause is the true cause. This will require some creative thought, but here are a few examples:

The spreadsheet file might be corrupted. Try opening the spreadsheet in other programs.

The user might be insufficiently privileged to import files. Have the user try to import a known good file.
The high number of images in the spreadsheet might be preventing it from being recognized properly. Save a copy without the images and try importing that.

Ultimately, you are looking for the exact set of steps that are necessary and sufficient to reproduce the problem, so it’s important that your experiments provide you with new information. Try to produce evidence that contradicts your theory instead of seeking out data that supports your theory. It’s easy to find data that is consistent with multiple explanations of a problem, and thereby seems to support even incorrect theories. This is why you should use another program to try to open a spreadsheet you think might be corrupted, instead of merely trying to import a different spreadsheet. Whatever happens with the second spreadsheet proves nothing about the first, but if the first opens successfully in the other program, it must not be corrupted. Similarly, if you want to test the theory that the spreadsheet is too large a file, you may be tempted to have the user try to import a smaller spreadsheet—but whether this succeeds or fails, it doesn’t tell you if the first spreadsheet was too large. Try to import a larger spreadsheet. If that succeeds, then you know the first spreadsheet is not too large after all.
At every turn, try as aggressively as possible to prove yourself wrong. Attack every theory in such a way that only the truth can survive, and that’s exactly what you’ll be left with.

Step Three: Provide a Solution
Once you know what is blocking the user’s workflow, it’s time to become an inventor and devise a solution. You must apply your skills, knowledge, and creativity to find a way for the user to reach his or her goal. Be resourceful and flexible; adapt to the issue and work around it with any of the tools at your disposal. Sometimes this step will involve sending the user a plugin that expedites the desired workflow. In extreme cases it may involve shipping a patch. Most of the time, however, the user just needs a backup plan. You’ll need to craft one that matches the user’s specific problem, but here are a few illustrative examples:

The spreadsheet is too large. Break it up into multiple spreadsheets and import it piecemeal.
The spreadsheet’s structure is confusing the file importer. Reorganize it to a more standard structure.
Normal GUI import methods don’t work. Use a different method, such as importing from the command line.
The user’s current software version cannot import the file. Upgrade to a more recent version that can perform the import.

In articulating this plan to the user, you must ensure that you provide the necessary context. Anticipate the “Why?” questions the user may ask, and answer them preemptively. It’s also essential to keep perspective and not to get hung up on irrelevant details or confuse exactly which problem you’re trying to solve. If the user just needs to get the information from this spreadsheet into the system, that’s your goal—not importing a file of a particular format. If the user can save the spreadsheet as a CSV and get rolling that way, that is a perfectly valid solution.

Step Four: Record for Posterity
With the user satisfied, it is now time to become an archivist. The hard work is over; all that’s left to do is make sure this work is never hard again. The problem has been solved, so no one else should have to come in blind. The goal is to make sure the information you wish you’d had when you started investigating is easily available for the next person who runs into the same problem. Gather all of the following information and anything else that would be helpful:

The version of the software on which the problem occurs.
The minimal list of steps required to trigger the problem.
Any relevant program settings or other environmental considerations.
Any data or external files involved (such as a sample spreadsheet).
Any workarounds or solutions found.

Record all of this data in the bugtracker. If it is relevant, get it added to the product documentation or public knowledge base as well. It’s easy for this to seem unimportant in the moment, but it’s essential for the team going forward. Document the information that renders the problem solvable using only that information. Your teammates, and indeed your future self, will be grateful.

If this process uncovered a product bug, this is also the time to file it with the product development team. Make sure to supply the same information here as well—the developers shouldn’t have to repeat any of your work. The less they have to do to find the bug, the faster they will repair it, and the sooner you can stop receiving emails and calls from users about it.

And there you have it! The Palantir Product Support approach to problem solving. It’s a lot more satisfying than running someone through a script and then putting them on hold!
While these steps are universal, applying them requires creativity and flexibility. There’s no surefire checklist that fixes everything. Most of the issues we face are far more complex and interesting than a stubborn spreadsheet, and every new issue means defining new diagnostic tests and finding new resolutions—and often, learning new skills. Being an engineer on Palantir’s Product Support team means continually broadening your horizons to help real people solve important problems.

We’re always on the lookout for more talent. If this sounds interesting to you, head on over to our careers page—maybe we’ll be working with you soon!

↧

How to Rock a Developer Phone Interview

September 18, 2012, 12:29 pm

≫ Next: Predictive policing: A window into future crimes or future privacy violations?

≪ Previous: Product Support at Palantir: Let Me Help You With That

We wrote a popular series of posts last year on how to rock Palantir on-site interviews. However, this advice does you no good if you don’t make it past the first hurdle: the phone interview. In this post, I’d like to give you some simple tips to maximize your chances on the phone interview.

The Basics

The tips in this section will seem trivial, but you’d be surprised at how many candidates mess up the basics. It’ll be much harder for you to show us how awesome you are if you don’t do these things:

Find a quiet, comfortable place to work. You should expect to be in an environment where you feel comfortable solving problems. If you’re in a busy or noisy area, it’s going to be hard for you to concentrate and difficult for your interviewer to understand you.
Make sure your phone works. This should be self-explanatory, but you’d be surprised at how many dropped calls we get. Ideally, you’d find a landline, although these are increasingly hard to come by these days. Another tip from one of our developers: use a headset that allows you to talk comfortably while typing or writing – the earbuds that come with the iPhone work perfectly for this use case. Similarly, if we’ve asked you to use Google Docs or Stypi for the phone interview, make sure you have an Internet connection.
Be prepared. Do your research – check out our website, read some of the blogs, discover our company culture, and perhaps try a demo! This shows interest in Palantir, and it will also help you form questions for the phone screener so that you can learn more about us.
Have a pen and paper ready. It’ll be much easier to think through problems if you have something to sketch or write on.

The Interview

For the most part, you’ll be asked a coding and an algorithms question. We’re written guides on each, so check out How to Rock the Coding Interview and How to Rock the Algorithms Interview. You’ll find that a lot of the advice for on-site interviews also helps you when you’re on the phone.

Here are some tips specific to phone interviews:

Think out loud. While this is useful for on-site interviews as well, it’s critical for phone interviews. Since we can’t see you, the only way we can understand your thought process is to hear you talk.
Don’t be afraid to ask questions. If anything is unclear about the problem, ask your interviewer – that’s what they’re there for.
Start simple and then expand. While you do want to think about the high-level design before you write any code, it’s good to come up with a simple solution first and then go from there.

After you’re done with the technical part of the interview, your interviewer will ask you if you have any questions. Don’t worry about asking the right questions or the wrong questions – this is your chance to find out about what interests you. We often get asked about what Palantir does, what we work on individually, what it’s like to be a new engineer at Palantir, and how our internship program is structured, but don’t limit yourself to these topics!

After the Phone Interview

After you’re done with the interview, we typically get back to you very quickly – a week or two at the latest. We may ask you to do another phone interview, or we’ll bring you on-site. If you don’t hear from us, don’t hesitate to reach out. Good luck and happy interviewing!

↧

Predictive policing: A window into future crimes or future privacy violations?

September 24, 2012, 12:43 pm

≫ Next: Grameen Foundation & Palantir: Partners for Food Security

≪ Previous: How to Rock a Developer Phone Interview

In some sense, police have long been in the business of going beyond reactive law enforcement, using information from various sources (e.g., anonymous tips and leads) as well as historical analyses to draw inferences from which to anticipate and address crime before it happens. But as policing budgets shrink and applications of predictive analytics (the catch-all phrase for a broad array of statistical analysis, machine learning, and myriad other algorithmic techniques) to the social sciences and commercial markets become more proven and ubiquitous, local law enforcement agencies have also begun to shift interest to formal, quantitative research programs collectively dubbed “predictive policing”. As paths to the systematic forecasting of criminal activities, these programs are intended to help agencies more efficiently allocate the increasingly scarce resources needed to fight crime.

Perhaps not surprisingly, however, predictive policing has also generated waves of often sensationalistic media coverage and raised serious concerns among privacy advocates. The prevailing focus on this sensationalism unfortunately obscures the more meaningful discourse on how this quantitative realm of predictive policing might—under the appropriate conditions and with applicable caveats—become a valuable and constitutionally viable component of the law enforcement arsenal. Indeed, a measured analysis of the privacy risks at stake provides an occasion to remind us that police are by and large committed to and have a vested interest in not simply enforcing the law and stopping bad guys, but doing so in a manner that can be rigorously defended if challenged in the criminal justice system. As such, it is in the interest of law enforcement sponsors of predictive policing programs to carefully evaluate how their efforts uphold privacy and civil liberty standards.

A recent article (Emory Law Journal, Volume 62, forthcoming 2012) by University of District Columbia Assistant Law Professor Andrew Guthrie Ferguson addresses the question of whether and to what extent these emerging approaches to predictive policing can impact the reasonable suspicion calculus (i.e., the set of considerations weighing individuals’ Fourth Amendment interests against countervailing governmental interests at stake in policing efforts such as pat-down searches when it is believed “that criminal activity may be afoot” (Terry v. Ohio 392 U.S. at 1)). In his discussion, Ferguson does an excellent job of both elucidating the landscape of predictive modeling regimes as well as outlining their respective privacy and Fourth Amendment implications. The article thereby provides a good framework for considering the privacy implications of predictive policing in general.

As a starting point, Ferguson provides a survey of the various techniques that have been employed under the heading of “predictive policing.” In so doing, the reader begins to appreciate that treating all predictive analytical approaches monolithically oversimplifies this landscape and does a disservice to those seeking a measured understanding of this field. The broad spectrum of analytical approaches entails a similarly broad range of privacy implications. At one extreme, algorithms that profile particular persons tend to evoke Minority Report anxieties amongst privacy advocates (i.e. concerns around wholly, almost mystically, opaque systems profiling citizens and accusing them of misdeeds they have yet to commit). Arguably more palatable approaches, such as event-based “near-repeat theory” models, focus on identifying behaviors (rather than people) that repeat known patterns under circumstances where specific environmental vulnerabilities are known to exist. These latter types are not only close kin to the well-established “high crime areas” policing paradigm, but also, as Ferguson suggests, offer a potential privacy protective refinement over the existing model.

But even the more agreeable end of the predictive policing spectrum is not wholly exempt from privacy considerations when it comes to assertions of reasonable suspicion of criminal activity. The phrase “reasonable suspicion” points to a standard of certainty less than the probable cause standard explicit in the Fourth Amendment (Terry v. Ohio 392 U.S. 1) and, in that sense, can be thought of as nothing more than a lower probability of certitude. Ferguson argues that while a notion of probability inheres in both Fourth Amendment “probable cause” (and, by extension, “reasonable suspicion”) considerations and also in the “probability scoring” of many predictive policing models, the resemblance is superficial. The two are, in fact, far from constituting comparable notions of predictability and an important distinction should be drawn between the “clinical” and “statistical” applications of prediction. While the former focuses on the particularities or “specific and articulable facts” of a given investigation “taken together with rational inferences from those facts” (Terry v. Ohio, 392 U.S. at 21) in order to derive specific assertions of criminal involvement, the latter attempts to apply a statistical modeling framework to infer possible criminal involvement based entirely on training data from previously observed circumstances that (more likely than not) have no direct relationship to the particular circumstances to which the model’s predictions are applied. It’s the difference between stopping and frisking an individual observed to engage in activity that, for example, has all the outward appearances of a drug deal versus stopping and frisking a person who happens to be standing in a particular place and time at which a statistical model predicts an occurrence of a drug deal.

What distinguishes the clinical, human-driven predictions of traditional policing practice from the machine-driven, statistical variant can be further elaborated by analysis of the (albeit imperfect) analogy to anonymous tips, which have long been in use by law enforcement to trigger or assist investigations. Ferguson identifies four principles that courts have historically relied upon to differentiate between the degree of certitude implied by an “inchoate and unparticularized suspicion or ‘hunch’” (Terry v. Ohio 392 U.S. 1, 27) provided by a tip and the reasonable suspicion threshold required to motivate police action: 1) predictive tips must be individualized, i.e. specific to persons and ongoing criminal activity in which they are suspected of being implicated; 2) predictive tips must be further corroborated by police observations related to those specific persons and their suspected criminal activities; 3) the predictive value of those tips turns on the level of particularized information involved; and 4) predictions may only remain viable for a relatively short period of time absent new corroborating evidence or fresh analysis.

Put another way, it may be legitimate and defensible to employ predictive models as a preliminary factor in reasonable suspicion analysis, but, as with suspicious activity reports, tips, or leads, “the use of predictive policing forecasts, alone, will not constitute sufficient information to justify reasonable suspicion or probable cause for a Fourth Amendment stop.” (Ferguson, 26) What is lacking from the statistical prediction is the particular detail linking abstract modeling outputs to the totality of circumstances constituting the actual observed particularities of a presumed crime. Moreover, even where a predictive model is intended to be applied to augment the evaluation of such particulars, timing is critical. Models do not have an interminable shelf-life, because the environmental factors underlying their predictive power are subject to change.

Even more importantly, the models themselves are only as reliable as experience—real, hard empirical evidence—dictates. A viable and defensible predictive algorithm (i.e. one that can be justified as a legitimate aid to law enforcement), must be backed by demonstrable proof of success under well-understood circumstances not only in order to impede criminal activity but also to stand up to scrutiny if and when a well-resourced individual challenges it. The defensible application of predictive policing is therefore contingent upon a number of factors: the continued existence of environmental vulnerabilities (e.g., poor lighting making a storefront or home vulnerable to burglary), a causal logic that plausibly explains how those factors precipitate specific criminal activities, reliable and accurate data and reporting on which to base the model, and finally a sound experimental framework for evaluation of the fidelity of those predictions. In other words, insofar as predictive policing aspires to science over clairvoyance (by and large it is scientists—e.g., sociologists, applied mathematicians, statisticians—who are leading predictive policing research on behalf of sponsoring law enforcement agencies), it must adhere to sound methodology to pass empirical muster.

But even beyond methodological considerations, Ferguson hints at another set of concerns that come into play in Fourth Amendment analysis of the merits of predictive policing models: the explicability and defensibility of algorithms in the courtroom. Much as criminal prosecutions today involve defensive challenges to the methodological steps by which probable cause for arrest is constructed, future criminal prosecutions of predictive policing cases will likely require a qualified witness to speak to the questions of the underlying predictive model’s provenance, accuracy, timeliness, and reliability. Certainly few police officers would have the background or be expected to attest to these qualities. Hence, qualified statisticians and program administrators will likely need to be able to explain, when challenged in court, how these algorithms work and how their predictions are translated into actionable intelligence for police. This may be less challenging for simpler algorithms, but may become increasingly problematic as a model’s complexity swells and algorithms become increasingly opaque.

As successes accumulate, sponsoring law enforcement agencies are likely to be tempted to stretch predictive models to accommodate more general crime types, a move that will typically entail introducing increased complexity. Program administrators and researchers may be tempted to generalize models to go beyond the circumscribed categories for which they were initially developed and tested and to apply them to categories of criminal activity that may not have clean, causal, pattern-based explanations (e.g., crimes of passion), or that require adaptation to dynamic adversaries and/or changing environmental vulnerabilities. Machine learning algorithms, for example, may be employed in an attempt to address these onerous desiderata—the machine is expected to adapt in response to changing modeling parameters and/or feedback from historical applications of predictions in a way that would otherwise require laborious, iterative manual modeling effort.
Beyond the aforementioned difficulties of providing a clear explanation of how a complex predictive algorithm works when challenged in trial, machine learning surfaces a novel and perhaps more daunting set of challenges to the demands of courtroom justification. Absent human supervision to ensure empirical soundness, adaptive machine learning models may develop in ways that are either resistant or wholly opaque to introspection and later explanation. If the trajectory of predictive analytics in other disciplines is any kind of leading indicator, one can reasonably anticipate that machine learning techniques will increasingly become the focus of predictive policing research and that these seemingly speculative concerns will become all the more relevant.

As these algorithms become ever more inscrutable black boxes, the lines between reality and the mystical fiction of Minority Report “precognition” indeed begin to blur. So it isn’t any wonder that the popularized fiction of predictive policing as a kind of clairvoyance seduces not only journalists reporting on predictive policing, but also some researchers as well. Still, Ferguson’s analysis dictates a more sober and circumscribed role for predictive analytics in policing wherein defensible algorithms have the following common features:

They cannot be over-reaching, i.e., they focus on a narrow set of crime types, all of which relate to environmental factors that are known to persist over time. Once those environmental factors change, the predictions lose their applicability. This also implies that most crimes of passion (including many violent crimes) do not lend themselves to predictive modeling.
They look at near-term predictions. They do not forecast years in advance. As with the weather, there are far too many complex variables at play to be fully modeled and, over time, the compounded influence of these exogenous factors will degrade even the most carefully calibrated predictions.
They are evaluated and proven in controlled experimental circumstances to, as much as possible, account for the influence of exogenous factors. Even in the best of circumstances, however, it is not possible to control for everything nor can it be expected that experimental circumstances will inhere in typical policing scenarios. Which is why…
Outcomes demonstrating crime reduction improvements are likely to be modest, and must be reported with careful attention to applicable caveats. The most credible predictive policing studies to date tend to suggest moderate improvements in reductions of certain classes of crime along with pointing to the need for continued study (see, for example, Modest Gains in First Six Months of Santa Cruz’s Predictive Police Program, Santa Cruz Sentinel (Feb. 26, 2012)

All of this is not to suggest that every effort under the predictive policing umbrella is legally intractable or practically unjustifiable, but rather to call out necessary considerations of the scope and limitations of plausible predictive policing regimes. When thoughtfully approached, as Ferguson points out, certain methods may in fact enhance privacy protections. For example, near-repeat predictive models (of the type currently being tested by the Santa Cruz and Los Angeles Police Departments) may offer a legitimate opportunity to tighten notions of “high crime areas” (already in heavy currency with many police forces) by refining the temporal and spatial dimensions of the high crime calculus. If—through such modeling—a police force can surgically apply pressure to specific areas at specific times and directed at specific criminal activities, the risk of privacy/civil liberties infringements can potentially be reduced. Moreover, scarce policing resources can be more efficiently utilized through such an approach.

Now, granted, all of this hinges on an “if” that is ultimately at best an empirical matter and must be borne out through careful experimentation. It also assumes the persistence of specific environmental vulnerabilities, and even with those in place, it must be recognized that the prediction alone—no matter how reliable—is not enough to constitute reasonable suspicion of a person or persons who happen to be found in the temporal and spatial crosshairs of a particular predictive model.

As a company, Palantir is proud to support the work of the law enforcement community and to enable initiatives that allow the police forces to do more with less. At the same time, we recognize that the development of predictive policing initiatives should be informed by careful consideration of the attendant privacy implications. In that vein, we approach potential engagements with predictive policing with a commitment to doing so in a thoughtful, rigorous, and ethically and legally responsible manner.

↧

Grameen Foundation & Palantir: Partners for Food Security

October 12, 2012, 11:03 am

≫ Next: Elbow Licking in Sudan: The Spread and Decline of Mass Unrest in Summer 2012

≪ Previous: Predictive policing: A window into future crimes or future privacy violations?

A Grameen Foundation Community Knowledge worker speaks with a Ugandan farmer.

A piece recently published on both the Scientific American and Fast Company’s CoExist websites highlights our most recent work with the Grameen Foundation. We participated in Hacking for Hunger, a first-of-its-kind hackathon held by the Office of Innovation & Development Alliances at USAID, which focused on global food security issues. From the article:

Palantir took 28,000 geo-located soil samples Grameen had taken from across Uganda and combined them with data on soil types, population, income, and other factors. The developers hope the system can also help identify potential disease outbreaks, and help create an alert system for farmers who might be affected.

Read on for more about the data-driven work that Grameen Foundation is doing, pioneering the use of smartphones as two-way information conduits to rural farmers in Uganda, and how Palantir is helping with those efforts.

Grameen Foundation’s CKW Program

Video submission to the Hacking for Hunger hackathon

Grameen Foundation is the charitable organization inspired by the work of Grameen Bank, one of the first organizations to pioneer microfinancing loans for the impoverished in the developing world. The foundation grew out of the success of the bank, and is a separate entity working on lifting people out of poverty. From their website:

Grameen Foundation helps the world’s poorest, especially women, improve their lives and escape poverty by providing them with access to small loans, essential information, and viable business opportunities. Through two of the most effective tools known – small loans and the mobile phone – we work to make a real difference in the lives of those who have been left behind: poor people, especially those living on less than $1.25 per day.

A CKW – t-shirt reads, “ASK ME ABOUT FARMING.”

One of their units, the AppLab builds mobile phone applications to leverage the power of cell networks to disseminate information to those who have the least access to it. The Community Knowledge Worker (CKW) Program hires people in Uganda and provides them with Android phones running an open source, custom agricultural information app with an entire data collection platform behind it. These CKWs then travel around Uganda, engaging with rural farmers to provide them with essential information to help them be better farmers. Everything from market pricing, best practices, and disease information is included in the application, and cached for offline access when the CKW is off the grid.

The first time CKWs meet a farmer, they register him or her in their phone, record some brief demographic information (“How many children under 11 do you have?”, “Do your children have shoes? Clothes?”, “What do you use for cooking fuel?”) and start answering questions. Most of the these farmers live outside of the coverage of Ugandan cell networks, but the phones use their GPS satellite signal to record the exact time and location of each query. When the phones return to the grid, all of the data about the queries are uploaded to a central server. The upshot is a perfectly geocoded dataset that maps out a large amount of rural farming done in Uganda.

Object Explorer plot of acreage farmed vs. count of farmers (log scale)

Of all the non-profits that the Philanthropy team has engaged with, the Grameen Foundation has some of the finest data we have ever encountered. They know that their data and their two-legged sensor network have value. They are widely regarded to have the best known data on agriculture in Uganda.

Our Work with Grameen Foundation

Object Explorer plot of query distribution among categories.

Palantir’s Philanthropy added Grameen Foundation as a partner in mid-2012, having first met them at DataKind‘s San Francisco Data Dive.

One of the first workflows that we were able to uncover in their data (highlighted in the video submission) is using query data to track the outbreak of crop and livestock illness. It is possible to see the progression of blights in the query data, which was not foreseen when the CKW data set was being collected. Once the data was imported, it only took about twenty minutes to start mapping out the spread of a chicken blight.

Heatmap of baby chicken blight: Animals Chicken Local Diseases Whitish Diarrhoea In Chicks Less Than 2 Weeks Old And High Death Rate

Public presentation of our work with Grameen Foundation began earlier this year with a presentation by Philanthropy Team Lead Jason Payne to a USAID-sponsored conference on African food security as part of the G8 meeting in Chicago. Here’s a recorded version of that earlier presentation:

Next Stop: Hunger Summit

Our work in the hackathon was a success. We integrated a new dataset collected by the CKWs—the 28,000 soil samples mentioned in the article—and built up some new workflows around the data. We were one of four finalists chosen as part of the hackathon. Grameen Foundation and Palantir will be traveling with USAID to the Iowa Hunger Summit (put on by The World Food Prize Foundation) on October 16th to present our work.

↧

Elbow Licking in Sudan: The Spread and Decline of Mass Unrest in Summer 2012

October 16, 2012, 4:30 pm

≫ Next: Announcing the Palantir Council on Privacy and Civil Liberties

≪ Previous: Grameen Foundation & Palantir: Partners for Food Security

Disputes between Sudan and newly independent South Sudan led to a halt in oil production in early 2012, bringing an economic crisis to both countries. On June 16, Sudanese President Omar Al-Bashir and his National Congress Party imposed austerity measures, including the withdrawal of wheat and fuel subsidies. Demonstrations against the regime broke out at universities across Sudan in response.

Despite a National Congress Party official’s claim that trying to oust Omar Al-Bashir was like “trying to lick your own elbow,” protests quickly spread and intensified, led by student activist groups such as Girifna, meaning “We are fed up.” The movement was co-opted by opposition parties, who turned out hundreds of worshippers at affiliated mosques to demonstrate against the regime.

However, although these demonstrations were larger and persisted for longer than any Sudanese protests had in many years, they began to die out by mid-July and had almost completed dissipated by August. This video shows how Palantir Gotham can be used to integrate news media and other open source data to deliver a comprehensive understanding of major events as they unfold. In this case, we have been able to track the conflict and assess the factors behind the escalation and subsequent de-escalation of unrest in Sudan.

↧

Announcing the Palantir Council on Privacy and Civil Liberties

November 2, 2012, 11:28 am

≫ Next: Palantir Hack Week 2012

≪ Previous: Elbow Licking in Sudan: The Spread and Decline of Mass Unrest in Summer 2012

Last month, at Palantir’s GovCon8 event, our CEO, Dr. Alex Karp, announced the creation of the Palantir Council of Advisors on Privacy and Civil Liberties (PCAP). This Council of experts has been created to assist us in understanding and addressing the complex privacy and civil liberties (P/CL) issues surrounding the use of our platform to aggregate and analyze of data in the many areas in which our customers work.

Over the course of the last couple of years, Palantir’s Privacy and Civil Liberties Team has assembled a group of some of the top P/CL academics and advocates in the world to advise us on P/CL issues related to the use of our platform. We have gathered this group (or a subset of it) every few months for in-depth discussions of various topics, such as the P/CL implications of supporting social media analysis or how to build P/CL protections into Palantir Video. These meetings have provided us with invaluable guidance as we try to responsibly navigate the often ill-defined legal, political, technological, and ethical frameworks that sometimes govern the various activities of our customers. In addition, while we do not collect or analyze data ourselves, we have occasionally leveraged the expertise of this group to provide help and advice to our customers who do.

The PCAP will effectively play the same role that our informal group of advisors has played to date. We will consult them for advice on identifying and responding to P/CL issues that may be raised with various current and potential customers (respecting, of course, the desires of those customers to protect their own confidentiality), and they will be asked to consider the P/CL implications of product developments and to suggest potential ways to mitigate any negative effects. The PCAP also will help us think about future developments in technology, how law and policy might change to account for that technology, and what steps Palantir might be able to take to help address these “over the horizon” challenges.

The PCAP will initially consist of the following members:

Susan Freiwald – A law professor at the University of San Francisco who frequently participates in electronic surveillance litigation efforts.
Robert Gellman – A privacy and information consultant who previously worked for nearly two decades on privacy issues in the U.S. Congress.
Chris Hoofnagle – A lecturer at University of California – Berkeley and Director of the Berkeley Center for Law & Technology’s Information Privacy Programs.
Stephanie Pell – A private consultant specializing in P/CL issues who formerly served in the Department of Justice as an Assistant US Attorney and later Senior Counsel to the Deputy Attorney General.
Jeffrey Rosen – A law professor at George Washington University, author, and frequent commentator on P/CL issues.
Dan Solove – A law professor at George Washington University, author, and founder of TeachPrivacy, a company that designs privacy and security training programs.
Daniel Weitzner – Co-founder of the Center for Democracy and Technology, former White House Deputy Chief Technology Officer for Internet Policy, and current Director of the CSAIL Decentralized Information Group at the Massachusetts Institute of Technology.

Bryan Cunningham, an information privacy lawyer and a long-time senior advisor to Palantir, will serve as the Executive Director of the PCAP.

We are enormously grateful to all PCAP participants for sharing their valuable time to advise us on these issues. We deeply respect their expertise and experience, which is why the PCAP will function under a set of rules designed to protect their integrity as independent experts. PCAP members will sign a non-disclosure agreement (NDA) that is modeled after those used by several advocacy organizations when working with private companies. PCAP members will be free to discuss anything that they learn in working with us unless we clearly designate information as proprietary or otherwise confidential (something that we have rarely found necessary except on very limited occasions).

The PCAP is advisory only – any decisions that we make after consulting with the PCAP are entirely our own and we will not ask them to publicly endorse anything we do. They will be free to criticize aspects of our work with which they disagree.

The field of P/CL experts is expansive and diverse, and although we have included some of the best on the PCAP we know that there are many more experts out there. Consequently, we will not rely solely on the PCAP for P/CL advice. We will continue to work with other experts and advocacy organizations on a regular basis to ensure that we are giving each issue the thorough consideration that it deserves.

Palantir is very proud to have assembled this group of experts, and we believe that they will be invaluable in helping us to fulfill our duty as a good corporate citizen to protect P/CL. Look for regular updates on the activities of the PCAP in this space.

↧

Palantir Hack Week 2012

November 5, 2012, 8:30 am

≫ Next: Introducing Code 33

≪ Previous: Announcing the Palantir Council on Privacy and Civil Liberties

Hack Week 2012

Hack Week is a hallowed tradition at Palantir, a time when all scheduled work stops and our engineers spend a week exploring new ideas, playing with different technologies, and building something from the ground up. People from across the company form ad-hoc teams and scramble to complete projects in one week of frenzied coding. In the video above, we take a quick look at Hack Week 2012.

For a lot of engineers at Palantir, Hack Week is the best holiday of the year. It’s an event that showcases the best parts about working here. Experimenting with side projects is encouraged year-round, but Hack Week is a special time during which we place particular emphasis on creativity and novel thinking. At the end of the week, teams present their projects to a panel of judges who determine a winning team, but Hack Week presentations aren’t about competition. They are celebrations of our engineering culture and a unique opportunity for everyone at Palantir to see the creativity and passion of our engineers expressed in the form of new, inventive ideas about where we could take our technology next. Many important parts of our platforms started as Hack Week projects, including Horizon, Palantir Mobile, and the Map Application.

Hack Week exemplifies the kind of immediate impact that an engineer can have by working at Palantir. Building on top of an existing platform with help from people across the organization makes it possible to produce dramatic results and usable features in a short amount of time.

↧

Introducing Code 33

November 2, 2012, 12:50 pm

≫ Next: Happy Veterans Day from Palantir

≪ Previous: Palantir Hack Week 2012

Code 33: the new Palantir Gotham web client

One of the most exciting things about Palantir is that nothing is sacred when it comes to our technology. We’re constantly asking ourselves, “What would we do differently if we were starting from scratch?” We’ve completely redesigned major features, introduced entirely new applications, and even retired outdated functionality. The next big change is possibly our most ambitious yet: rather than changing features within the client, we’re redefining the client itself.

By late 2011, the consensus among the product team was that if we could write the front end all over again, we’d use web technologies rather than Java Swing. Of course, when Palantir launched, Swing was very good to us. It enabled a rich graphical interface that web languages couldn’t replicate at the time. By using Java Webstart we could avoid local installs and load the client on any machine (though this also required large downloads and the installation of Java, which created the potential for release mismatches). Times change, however, and Palantir has changed with them, which is why we (including our stellar Summer 2012 intern class!) have been hard at work developing an entirely new, fully-featured web client. We call it Code 33.

Code 33 brings Palantir Gotham to the web, resulting in our fastest, smoothest, and most beautiful product yet. Code 33 is written in HTML5, and compatible with all modern web browsers. It also incorporates current open source web technologies, including CoffeeScript (for JavaScript development), LESS (for styling), Handlebars (for templates), jQuery (for client-side scripting), and Backbone (for structure). From a developer’s standpoint, it’s been a dream to work on, and we’re pretty confident that users will feel the same way.

Here are just some of the major benefits we anticipate from bringing Palantir Gotham into the web world:

Increased Accessibility: As a fully web-enabled product, Palantir Gotham will be easier than ever to deploy. Any modern web browser will work, with no Java or special plug-ins required. Single-page sign-on will be an option for many customers, and all users will enjoy lower bandwidth requirements and decreased download and launch times.

Improved Usability: For new users, walk-up usability will be greatly enhanced through web UI features that are already familiar and comfortable. For experienced users, the classic design elements of Palantir will all be there in a fluid web interface.

Ease of Extensibility: Code 33 embraces open web standards and a plug-in architecture, simplifying the creation of new helpers and applications. Additionally, while we chose CoffeeScript and Backbone, it’s possible to build applications on top of Code 33 using different frameworks.

Standalone Products for Specialized Users: Code 33 is built as a modular, single-page application, making it easy to prototype and field standalone tools for specific use cases. Special products already in development include a web-based document reader with integrated search and feeds, and a browser-based GIS application that displays rich, annotated maps shared by Palantir Gotham workspace users.

Code 33 in the Cloud: Code 33 is fully compatible with government and enterprise clouds, and leverages Palantir Gotham’s distributed computing framework. Code 33 delivers the Palantir Gotham client in the browser, but moves the computational requirements of the UI from the client machine to the server in order to streamline the user experience.

It’s also worth noting that Code 33 continues to use Palantir’s standard Java-based back-end servers, allowing users to enjoy the best the web has to offer while taking advantage of years of innovation in processing, performance, scale, and security. Just as Java Webstart allowed us to provide the best aspects of both thin and thick clients five years ago, Code 33 gives users all the benefits of enterprise computing power in the most lightweight package possible. This has been a continuing theme in Palantir’s evolution. We’ve developed the platform around core principles and building blocks designed to stand the test of time, and these give us tremendous flexibility and freedom to incorporate the best new technologies, even those that haven’t been invented yet.

The Future: The first public glimpse of Code 33 came at Palantir’s GovCon Secure this past July, and the response was overwhelming. The following month, during Hack Week 2012, many of our engineers used the Code 33 developer environment to build dozens of new plug-ins, data integrations, and application modules. Operational rollout of Code 33 will begin for select customers in early 2013. Stay tuned for more in the months ahead on Palantir’s blogs, or at your local deployment. And check out the additional screenshots below of the Search and Graph applications. [Note: all data is notional.]

One last thing: why is it called Code 33? The name was inspired by the Code 33 blend made by our beloved Philz Coffee. Coincidentally, this is the blend Philz developed to help the San Francisco Fire Department stay up all night, and we can confirm that it does the trick!

↧

Happy Veterans Day from Palantir

November 12, 2012, 1:29 pm

≫ Next: Philanthropy Engineers embed with Team Rubicon for Hurricane Sandy Relief

≪ Previous: Introducing Code 33

↧

Philanthropy Engineers embed with Team Rubicon for Hurricane Sandy Relief

November 14, 2012, 10:19 am

≫ Next: Panel features Palantir Women on Working in Tech

≪ Previous: Happy Veterans Day from Palantir

Since 4 November, Palantir’s Philanthropy Engineering team has been supporting Team Rubicon’s Hurricane Sandy relief efforts. We have engineers on the ground in Brooklyn and Far Rockaway who are standing up, operating, and training volunteers on our software, while others provide reachback support from our offices across the country. This is an exciting time for Philanthropy Engineering at Palantir not only because we are supporting such an inspiring mission, but because it is the first time that we’ve deployed engineers to a disaster area to support relief efforts directly. Several of our Philanthropy Engineers have been writing up reports from the field, and we want to share them here since they best communicate what this effort is all about.

But first a little background. Team Rubicon is a non-profit relief organization that dispatches volunteers to disaster zones from its network of nearly 5,000 military veterans throughout the United States. Their mobilization in New York and New Jersey in the aftermath of Hurricane Sandy has rightfully earned the attention of local and federal government and the media alike. You can read about their efforts on the White House blog and the New York Times “At War” blog, or you can watch this recent segment on NBC Nightly News.

We have donated our software and technical expertise to help Team Rubicon streamline the process of receiving, responding to, and closing out requests for help from residents whose homes were damaged by Hurricane Sandy. The video below from this story by WNYC provides a glimpse of how our software is being used to get more residents the help they need, faster.

Dispatches from a Forward Deployed Philanthropy Engineer

Spending their nights at a base camp in a warehouse next to a rock-climbing gym in Brooklyn and their days at a makeshift forward operating base in a school bus on Rockaway Beach Boulevard, our Philanthropy Engineers have been on the ground for a week and a half now. After a first wave of engineers stood up the software and endured a nor’easter that hit the area with freezing temperatures, 60mph winds, and several inches of snow on Tuesday night, we immediately began to see the impact that we could have on Team Rubicon’s everyday operations. One of those in our second wave of Philanthropy Engineers has been keeping a journal of his time on the ground. These dispatches from the field tell the story.

Sunday 11 November

From 6:30 am to 6:00 pm, we helped Team Rubicon handle several thousand volunteers and several hundred simultaneous projects by distributing data entry and scrubbing duties to dozens of both trained volunteers and Palantirians who were monitoring the Rubicon instance. Eight mobile devices running Palantir Mobile spent the day feeding disparate data into the instance—pictures of residents standing in front of their damaged homes, phone numbers, information about the work crews and tools needed to help them, etc. This information was modeled and analyzed by a team of Palantirians located around the country and then acted on within hours by teams of volunteers. To give just one example of how critical this reachback support has been, one of Team Rubicon’s mobile operators almost ran out of battery life until a Palantirian in North Carolina noticed his battery level remotely and emailed us in NYC, whereupon we sent out a Rubicon member with an extra battery who was able to find the operator in need by using Blue Force Tracking in Palantir Mobile.

Data entry was accomplished via any and every means available: texting pictures of hand-written forms brought in by local residents; texting data from those forms via SMS to Brian in Brooklyn; entering data into Webflow on an iPad in the hands of a Rubicon volunteer offsite at the headquarters of another aid organization that wished to coordinate with Rubicon. Our networked command-and-control system, designed to work around the dial-up level of bandwidth available on the Rockaway peninsula during most of the day, was so well-oiled and managed by Brian and Zach that Alec and I were simultaneously able to work and train two Rubicon employees to a high level of proficiency with Palantir Gotham. They are now able to perform 90% of the tasks necessary to keep Rubicon functioning at optimal efficiency.

An encouraging day, but there’s more work to be done. We are in need of long, deep ontology conversations over dinner. There’s a lot to think about.

Volunteers entering data by SMS

Monday 12 November

We started the morning slightly overwhelmed as we walked into our Rockaway operations center and saw the piles of yesterday’s work orders distributed at random on the tables, chairs, and floor. Some of them were covered with handwritten notes while others were torn into patterns that seemed to evince a mysterious and forgotten organizational artform. Priority number one immediately became the acceleration of our plan to replace paper work orders with direct Rubicon-team-leader-to-Palantir-instance dispatch. This will be up and running tomorrow morning thanks to our smart, eager Rubicon trainees and our support from Palantirians across the country. Steve paid us a very welcome visit in our warehouse office this evening to deliver laptops.

To clean up our data and determine who had already had a visit from a work crew, be it Rubicon or anyone else–several groups of volunteer contractors and spontaneous aid organizations were operating in the Rockaways on Sunday–we opened Palantir Phone Bank, run by a Rubiconian sifting through the Browser app and managing three volunteers calling hundreds of open work cases to determine their current need.

As the morning wore on, our Rubicon users took over more and more of our tactical management responsibilities. Data entry, data updates, data scrubbing and dispatch were all handled by Rubiconians. Alec and I are still doing some minute-to-minute work. When our Alaskan firefighter chainsaw-wielding wood-chipping expert walked in asking for trees to attack, in five minutes Alec had him a printed map of every NYC 311 call in four square miles reporting a downed tree. But when 50 merchant marine cadets walked up looking for work, it was Tina, a Rubiconian, who, on her second day using Palantir Gotham, pulled up all the open work orders, quickly ran a histogram on crew size needed, threw all the big jobs on the map, and five minutes later handed them a package of eight 10+ person jobs all along the end of one particularly hard-hit street.

About 30 local contractors and carpenters are taking the day off tomorrow to volunteer, and they heard Rubicon might be able to help focus their efforts. It was pretty fun watching their jaws drop when I handed them a packet of nine cases within walking distance of one another topped by a map pinpointing each residence with its job type and contact info. The icons on the map were coded according to the jobs’ priority. Two red icons, for example, indicated two residents who were elderly and had no family in the area to help them. The priority coding system, which will be refined by an ontology update tonight, lets us respond to all of Rubicon’s and the local community’s highest priorities. A local synagogue, for example, which yesterday morning requested help from a 50-plus-person work crew to haul out tens of thousands of pounds of sand and destroyed furniture, is as of this afternoon ready for rebuilding.

Most exciting of all, perhaps, was our visit from celebrity chef Rocco DiSpirito, who brought us all lunch in the form of “gourmet soup” and garbage bags full of fresh bread. “This bread,” said one of his kitchen helpers in a reverent whisper as she handed Alec a couple of pieces, “is very expensive.”

A good day’s work.

Color-coded Work Requests in Rockaway as viewed in Palantir Gotham’s Map application

Tuesday 13 November

Today, it rained. Rain makes us wet. Being wet makes us cold. Being cold makes us put on gloves. Wearing gloves makes typing 44% less accurate than not wearing gloves.

The weather delayed our plan to completely replace paper with interactive Palantir case checkout kiosks, as the contingency plan of setting up the laptops under the tent was immediately made ridiculous by the horizontal water whipping through our parking lot. Our new data model still needed some fleshing out anyway, so the extra day was a good chance to do some more thorough scrubbing, prepare for the nightly correction of mistakes we’ve made, and finalize our new ontology while simultaneously entertaining a seemingly endless parade of visitors to our ever-expanding operations center. I sit in the back now, and try to stay out of the way of the Rubiconians who are actually getting work done while I field phone calls and coordinate meetings. But it’s fun as long as we can keep inventing new systems that do neat things. Like radically different ontologies that allow us to scale out to cover the entire city in a logical way, if that’s something we ever decide we want to do. It’s fixed now, and ready to go. All in a rainy day’s work.

Patrick now mans the Google Voice phone line that Alec set up, which all of the team leaders text with progress reports. Patrick reports time-sensitive ones as needed, and then periodically pastes the entire log into a notepad file which he then imports into the instance. Rubicon analysts in the front of the bus then go through that dispatch file and tag the updates to property changes on the open cases. Number of data-contributing users on the Rubicon instance: 40. Number of analysts contributing to a typical object by the time it is complete: 5–8.

Alec and I have not yet showered in New York State, which is great in some ways—-no one seems to visit our corner of the warehouse anymore, so we get way more work done around dinnertime—-but probably not so great for hygiene. We’re going to go clean up and try to get more than four hours of sleep tonight.

Oh, and we got gourmet soup. Again. Rocco DiSpirito drove up by himself, dropped it off, and left with a quiet word of thanks.

A Philanthropy Engineer shows the Rubicon instance to an NYPD officer aboard the green bus operating base

What’s Next

Our Philanthropy Engineers will remain embedded with Team Rubicon for as long as they are needed. We are incredibly proud to be supporting such an inspiring organization and we’re excited to see what more we can accomplish in the coming days and weeks.

↧

Panel features Palantir Women on Working in Tech

November 20, 2012, 10:50 am

≫ Next: Privacy, Civil Liberties, and Video Analytics: Part 1

≪ Previous: Philanthropy Engineers embed with Team Rubicon for Hurricane Sandy Relief

On November 13, Pooja Sankar, CEO of Piazza, an online Q&A platform used by students and teachers, sat down with four women at Palantir to discuss their experiences working in technology. Originally broadcasted live to students in Piazza’s network, the panelists spoke about their roles at Palantir and how they’ve navigated the ups and downs of working at a rapidly growing company in a fast-paced industry. Sankar not only moderated the panel, but shared a bit of her own experience as a former graduate student, new mother, and CEO.

If you’re grappling with the decision between pursuing a graduate degree or working in industry, or wondering how much gender balance in the workplace can affect work day-to-day, check out the video to hear the panelists’ thoughts and opinions.

Panelists include:

Dana Kleinerman, Tech Writer
Dana graduated from Cornell University with a Bachelor’s degree in Math, a Master’s in Computer Science, and went on to complete a Post Baccalaureate program at The University of Pennsylvania studying advanced sciences. As a technical writer at Palantir, Dana has found a way to combine her background in Computer Science with her passion for writing.

Danielle Kramer, Software Engineer
Danielle earned her degree in computer science and cognitive science at Carnegie Mellon University, where she researched machine learning and served as a teaching assistant for Great Theoretical Ideas of Computer Science and Principles of Programming. She is a software engineering lead on Palantir’s Infrastructure team, which is responsible for storing, searching, scaling, and sharing the data that powers Palantir Gotham.
Ashling Loh-Doyle, Designer
Ashling graduated from Stanford University in 2010 with degrees in Economics and Studio Art, and found it impossible to resist the world of tech. She spent her first two years at Palantir as a graphic and product designer, and is currently building the company’s Identity team, which is responsible for internal and external communication.
Yael Schraeger, Product Navigator
Yael earned her Bachelors of Science at Stanford in Symbolic Systems (with a minor in dance!), and completed her PhD at UC San Diego. She is a product management lead at Palantir, where she works with Engineering and Business Development to decide the direction of the company’s software product.

Are you a woman currently pursing a degree in a technical field? Know someone who is? Encourage her to check out Palantir’s Scholarship for Women in Tech.

↧