Contact information


Pittsburgh, PA


Email: bozena @

Twitter: @BozenaPajak

Learning Science at Duolingo

    I'm the head of Learning Science at Duolingo. Our mission is to improve Duolingo's effectiveness through learning science and pedagogical expertise.

    We are responsible for improving Duolingo's pedagogical approach and for assessing how much our users are learning. We make use of recent research findings, as well as Duolingo's vast user data to guide product development.

Academic research interests

  • language acquisition in adulthood
  • statistical learning
  • bi/multilingualism
  • non-native speech perception
  • recognition of foreign-accented speech
  • adaptation to non-native accents
  • computational modeling of learning and adaptation


Learning languages as hierarchical probabilistic inference

collaborators: Roger Levy, T. Florian Jaeger, Alex Fine, Dave Kleinschmidt

My main research interests include second and additional language acquisition, especially in the domains of phonetics, phonology, and morpho-phonology. I am particularly interested in the mechanisms underlying generalization from previously acquired knowledge, which I study through behavioral experiments and computational modeling.

More specifically, I investigate how people integrate multiple sources of information to make inferences about the language input they are exposed to. I am particularly interested in:

  • how listeners' prior language background affects how they interpret the statistical structure of the incoming speech signal, and
  • how learners generalize from the properties of known languages when learning a new language.

I have proposed a model of second (L2) and additional (Ln) language acquisition as a process of probabilistic inference under uncertainty where known languages serve as a resource that learners use to make implicit inferences about the properties of new languages. The model constitutes a novel way of approaching L2 and Ln acquisition and allows us to ask questions about generalization that have not been previously addressed. Other existing theories have instead focused on cases of native language (L1) interference, with facilitation (or positive transfer) predicted only when some linguistic properties happen to coincide in L1 and L2.

The main prediction of the model is that, if learners indeed use their previous language knowledge as a basis for inferences about the language they are currently learning, then we should be able to find evidence of L1-to-L2 facilitation that goes beyond immediate similarities between the two languages and that cannot be explained by direct positive transfer. That is, there should be signs of generalization from the properties of known languages reaching beyond the actual data available to a learner. In my work so far I have tested this general prediction in the areas of speech perception and speech category learning. The results show that learners are able to take advantage of their current linguistic knowledge by generalizing fine phonetic detail across different segments and across languages.

relevant papers: Pajak (2012); Pajak & Levy (2014); Pajak, Fine, Kleinschmidt, & Jaeger (in revision)


Analogy-based abstraction in learning non-native phonetic categories

collaborators: Roger Levy, Klinton Bicknell, Page Piccinini

Phonetic category acquisition is a complex problem of learning a mapping from variable phonetic tokens onto discrete categories. How is this achieved? Prior experimental and computational work has identified two main sources of information available to and used by learners, both infants and adults: statistical distributions of sounds and lexical context. I have proposed that, in addition to those two sources of information, phonetic category learning is supported by analogy-based abstraction: learners infer commonalities between observed phonetic contrasts (e.g., /b/-/p/, /d/-/t/), which leads them to expect analogous contrasts defined along the same phonetic dimensions (e.g., /g/-/k/). This type of analogical abstraction can effectively bootstrap the acquisition of a language's entire phonetic system given the typological evidence that languages tend to reuse the same phonetic dimensions for multiple contrasts (Clements, 2003).

relevant papers: Pajak & Levy (2011); Pajak, Bicknell, & Levy (2013); Pajak & Levy (2014); Pajak, Piccinini, & Levy (2014, ASA poster)


The basis of generalization in adaptation to foreign-accented speech

collaborators: Ann Bradlow, Matt Goldrick

Previous work has shown that adaptation to foreign-accented speech involves generalization beyond talker-specific characteristics: listening to multiple same-accent talkers facilitates comprehension of a novel talker of that accent (Bradlow & Bent, 2008; Sidaras, Alexander, & Nygaard, 2009), and exposure to multiple novel accents facilitates comprehension of a completely novel accent (Baese-Berk, Bradlow, & Wright, 2013). I am intersted in what underlies this type of generalization. To answer that question I pursue computational modeling work, investigating what specific properties of foreign-accented speech lead to facilitation in comprehension of novel talkers and/or accents.


Speech-in-speech perception of native and foreign-accented talkers

collaborators: Angela Cooper, Ann Bradlow

How do people perceive speech in the presence of competing speech? One relevant finding is that the comprehension of native speech is facilitated when the background speech is foreign- rather than native-accented (Calandruccio, Dhar, & Bradlow, 2010). But why would that be the case? In this project we examine two possible reasons: (1) speech segregation may be facilitated whenever two speech streams are dissimilar along some dimension, or (2) native speech may capture listeners' attention in a way that foreign-accented speech does not. In order to tease these possibilities apart we employ a speech-in-speech sentence recognition task where the talkers are either native speakers of English or second-language learners who speak English with a foreign accent.

Last updated: 10-Oct-2018