A theoretical model of micro-learning for second
language instruction
Hou Keat Khong & Muhammad Kamarul Kabilan
The notion of “Micro-Learning” (ML) has been repeatedly
accented as a successful learning approach in different
learning phenomena. Despite these optimistic emphases,
several studies lack a theoretical grounding in adoption of
ML, thus missing a shared perspective of the education
community. The scarce theoretical justification for understanding
the nuanced dynamics of ML restricts the practical
use of this pedagogical approach in “Second Language”
(L2) instruction. Therefore, this paper seeks to fill the gap
by proposing a theoretical model of ML for L2 instruction.
First, a brief background on ML is provided evaluating its
benefits and pitfalls in general teaching and learning enterprise.
Second, three established theories are explicitly discussed
based on a careful examination of the conceptual
characteristics and empirical observations of ML. A theoretical
model of ML is then devised based on relations postulated
among proposed theories and application of the
model to existing L2 MLs is made explicit. Finally, implications
for research and practice are discussed to offer a
more robust and descriptive picture of how ML can promote
L2 teaching and learning across different contexts.
Drawing from these theoretical insights, a principled way
to integrating ML into L2 instruction can be made available
for future research.
List of abbreviations: ML: Micro-Learning; L2: Second
Language; CLT: Cognitive Load Theory; CTML: Cognitive
Theory of Multimedia Learning; SDT: Self-Determination
Theory; SLA: Second Language Acquisition; CL: Cognitive
Load; ICL: Intrinsic Cognitive Load; ECL: Extraneous
Cognitive Load; GCL: Germane Cognitive Load; CATLM:
Cognitive-Affective Theory of Learning with Media; EVT:
Expectancy-Value Theory; ZPD: Zone of Proximal
Development; SEM: Structural Equation Modeling
Cognitive load theory;
cognitive theory of
multimedia learning; microlearning;
second language
instruction; selfdetermination
of motivation
ML is a technology-mediated learning approach whereby learners are
directly exposed to short-term learning activities formally or informally.
These activities are created based on well-planned microcontent by
means of real-time micromedia environments to construct microknowledge
whereby the 3Ms (microcontent, micromedia and microknowledge)
embody the uniqueness of ML (Hug, Lindner, & Bruck, 2006a, 2006b;
Lindner & Bruck, 2007). According to Lindner (2007), micromedia refers
to digital atomized media including mobile networked devices that delivers
reusable small and self-contained pieces of digital microcontent to
facilitate the construction of single focus microknowledge. In recent literature,
it is observed that increasing emphasis is placed on three key
aspects in current definitions of ML: Technology, content, and learner.
ML is repeatedly associated with mobile and digital technologies (technology)
emphasizing at the same time the design and development of
microcontent and the flow of micro-activities (content). At the micro
level of learner groups, individual learners including their attention span
and motivation (learner) are central in the modern conceptualization of
ML. Keywords like nuggets, coherent, self-contained, well-planned,
spaced, interactive, immediate feedback, ubiquitous, just-in-time, personalized,
adaptive, autonomous, motivational, self-regulated, among others,
are widely used (e.g., Baldauf, Brandner, & Wimmer, 2017; G€oschlberger
& Bruck, 2017; Park & Kim, 2018; Sun, Cui, Yong, Shen, & Chen, 2018).
These keywords signal an incline to improve and expand the concept.
This main objective of this paper is to propose a theoretical model of
ML for L2 instruction. In Section 2, a brief background on ML evaluating
its potential benefits and pitfalls in general teaching and learning
enterprise is provided to familiarize readers with the core concept of ML.
This also helps address the contradictions and challenges that ML has
had to contend with. In Section 3, three established theories pertinent to
ML are explicitly suggested and discussed based on a careful examination
of ML’s conceptual characteristics and empirical observations. This theoretical
proposition is intended to provide deeper insights into the
nuanced dynamics of ML justifying therefore its practicality as a successful
learning approach in diverse domains in general and in L2 instruction
in particular. Based on the dynamic interplay of cognitive and
motivational theories within the concept of ML, the tripartite theoretical
model of ML is devised in Section 4 to offer a principled way to integrate
ML into L2 instruction across diverse languages and learning settings.
In Section 5, a practical way to apply the model to existing MLs is
made explicit hierarchically in accordance with Cognitive Load Theory,
Cognitive Theory of Multimedia Learning and Self-Determination
Theory to offer a descriptive understanding of different perspectives of
the ML model for L2 instruction. Finally, in Section 6, implications for
research and practice along with the motivation and future vision for the
study are discussed to offer a more robust and descriptive picture of how
ML can promote L2 teaching and learning across different contexts.
2. Background on micro-learning
ML, like any other technology-mediated approaches to general teaching
and learning, has its benefits and pitfalls. On one hand, ML has repeatedly
been corroborated as a successful learning strategy in the area of
corporate learning and development mainly because this convenient
small-step learning leverages “the ubiquity, intimacy and usability of
mobile devices” (Bruck, Motiwalla, & Foerster, 2012, p. 539) to support
“work-integrated learning” (Decker, Hauschild, Meinecke, Redler, &
Schumann, 2017, p. 132) in today’s hectic working environment to avoid
information overload. ML has also been highlighted to have the potential
to empower self-directed lifelong learning (Buchem & Hamelmann,
2010) and support the development of learner autonomy (Nikou &
Economides, 2018). These highlights are supported by the fact that
smaller chunk size, single-topic, and autonomous MLs are easier to be
integrated into our daily and timely learning routine. In addition, ML is
often associated with mobile learning by dint of its personalized, situated,
authentic, spontaneous and informal attributes (e.g., Cates, Barron, &
Ruddiman, 2017; Edge, Fitchett, Whitney, & Landay, 2012). This association
has led researchers to consider ML in mobile Massive Online Open
Courses of various disciplines (e.g., Sun, Cui, Yong, Shen, & Chen, 2015;
Sun et al., 2018). In a nutshell, technology-mediated approaches like ML
have begun to stimulate new concepts and strategies to support effective
learning in a more flexible manner.
On the other hand, there are contradictions that highlight some potential
pitfalls of ML. Neelen and Kirschner (2017) raise some important
concerns about ML definitions. They point out a notable lack of definitional
consensus concerning ML. Besides, Shackleton-Jones (2016)
claims ML the next big bad idea. He argues that “content does not
become more useful by breaking it into smaller pieces” (para. 9) and
equates ML to content dumping which discourages learning. He contrasts
ML with disrupting learning where the latter focuses on resources
building and needs analysis to support meaningful learning. Similarly,
Jomah, Masoud, Kishore, and Aurelia (2016) highlight an important
limitation of the concept that “micro-learning is NOT useful when people
need to acquire/learn complex skills, processes, or behaviors” (p.
104). This limitation is congruent with the perspective of cognitive load
theory which relates human working memory limitations to learning
implying that a complex skill can usually be acquired and automated
after a relatively long period of repeated exposure and never through one
single practice. Cutler (2014) also foregrounds the possible overreliance
on and misuse of ML by teachers and learners and predicts that problems
may surface when learners become “overly dependent on this mode
of instruction” (para. 10). These contradictions signal a need for greater
critical evaluation of ML or any other educational technology, and at the
same time, more systematic theoretical, methodological and researchpractice
engagement is sought (Golonka, Bowles, Frank, Richardson, &
Freynik, 2014; Lin & Lin, 2019; Shadiev, Hwang, & Huang, 2017).
From the theoretical perspective, several published studies lack a specific
theoretical grounding in adoption of ML for their teaching and
learning. Either they are too technocentric or they focus primarily on the
superficial concept of ML (e.g., G€oschlberger & Bruck, 2017;
Mohammed, Wakil, & Nawroly, 2018; Park & Kim, 2018; Skalka &
Drlık, 2018). In this regard, we concur with the view of Hew, Lan, Tang,
Jia, and Lo (2019) that “explicit engagement with theory was absent” (p.
956) in most educational technology research, the view of Salas (2017)
that “its [ML] marketing appeal has allowed many vendors and learning
professionals to ignore its lack of theoretical foundation” (para. 7) and
the view of Plonsky (2011) that the lack of theory has rendered research
on L2 to be based largely on “convenience, intuition, and/or some level
of idiosyncrasy” (p. 998). This is called into question by various quarters
the basic tenets of ML that supports learning. Other more robust studies
have concurrently called for further in-depth research as this new concept
still lacks a solid empirical basis, including the field of foreign language
learning (Brebera, 2017).
3. Theoretical proposition for micro-learning
In this section, three theories pertinent to ML are suggested and discussed.
They are: Sweller’s (2020) “Cognitive Load Theory” (CLT),
Mayer’s (2014a) “Cognitive Theory of Multimedia Learning” (CTML),
and Ryan and Deci (2017) “Self-Determination Theory” (SDT) of motivation.
Besides supporting the typical alignment of cognitivism with
“Second Language Acquisition” (SLA) (Atkinson, 2010) in comparison
with behaviorism and constructivism, the proposed cognitive theories
also explicitly support the conceptual characteristics of ML, for instance,
CLT is most consistent with the design and development of microcontent
and the manipulation of micro-activities, while CTML is closely
related to the use of mobile and digital micromedia. Moreover, ML is
associated with student-centered approach because it accommodates the
patterns of media use today and the new generations’ learning needs and
preferences. Looking back on the definitional keywords of ML like personalized,
adaptive, autonomous and self-regulated, they clearly highlight
the importance of the motivational aspect of ML for L2 instruction.
Therefore, SDT is chosen to be the foundation of effective ML that leads
to microknowledge construction.
3.1. Cognitive load theory
From the perspective of cognitive science, learning is said to have taken
place when novel information is successfully stored as a knowledge base
in long-term memory (Sweller, 1994, 2015). The information is novel to
a learner when there is no, or limited prior knowledge stored in his/her
long-term memory. Before this storing can happen, novel information
has to be processed and elaborated consciously in working (or shortterm)
memory which is known for its severely limited capacity and duration
(Cowan, 2001; Miller, 1956). According to Miller, working memory
can hold not more than seven elements of information for about
20 seconds at one time during a cognitive task. Therefore, CLT relates
the nature of working memory limitations to learning suggesting that
explicit instruction of novel information should not exceed the cognitive
(working memory) load in order for meaningful learning to occur
(Leppink & Hanham, 2019; Paas, Ayres, & Pachman, 2008). “Cognitive
Load” (CL) is thus defined as the amount of information that working
memory can hold at one time. When the CL of any instructional designs
is high, learning will be ineffective. In sum, CLT asserts that students
learn best when the information presented and the processing demands
secure a good level of alignment with the capacity of working memory
of human cognitive architecture (Choi, van Merri€enboer, & Paas, 2014;
Paas, Renkl, & Sweller, 2004).
In any learning situations, CL may take one of these three forms:
Intrinsic, extraneous or germane CL (Kalyuga, 2011; Sweller, 2010).
“Intrinsic CL” (ICL) is determined by the inherent complexity of learning
content or task being learned at a given level of expertise. A complex
task usually consists of high number of elements that interact among
one another simultaneously. According to Sweller (1994), an element
refers to “any material that needs to be learned” (p. 304) and each element
interactivity requires working memory capacity. Therefore, ICL is
considered high in a complex task as such. In contrast, a simple task
generally consists of low number of elements with low element
interactivity, hence ICL is considered low. Certain tasks are naturally low
in element interactivity where each element is simple and independent of
every other element, for instance, learning new terms, (Sweller, 2020).
Secondly, “Extraneous CL” (ECL) is determined by instructional
designs (Sweller, 2007, 2020), hence CLT suggests that it is important to
reduce ECL in any instructional designs as ECL interferes with learning
by imposing unnecessary cognitive demands. According to Sweller, various
studies have shown that well-structured instructional procedures,
which are based on learners’ expertise level in the target content as well
as reduction in the number of interacting elements and redundant information,
can effectively reduce ECL. This reduction minimizes the effort
a learner needs to exert to process instructions which are not related to
learning. Lastly, “Germane CL” (GCL) results from engaging in learning
activities and it is needed for “schema acquisition” (Sweller, 1994, p.
296), which is a form of useful alterations to information stored in longterm
memory that make up learners’ knowledge base and this is directly
related to learning.
These three forms of CL provide theoretical justification for ML.
Coherent and self-contained microcontent, on one hand, is consistent
with CLT in that it intends to reduce ICL by segmenting the fragmentable
complex task into smaller chunks that can be learned independently
in reasonable short bursts. As an illustration, SLA is considered to have
high element interactivity (Sweller, 1994). It may be impossible for students
to deal with all language aspects at one time. In this regard, ML
helps facilitate, for instance, L2 vocabulary learning, in two possible
ways: ML lowers the length of task (difficulty) and hence the element
interactivity (complexity) at the expense of understanding at this stage.
The lower-interactivity microcontent will leave enough working memory
capacity for GCL to take place as according to Moreno and Mayer
(2007), learning material which is unfamiliar to learners requires much
more cognitive resources for essential processing in working memory.
Once these individual aspects have been learned, their interactions can
be emphasized more readily at the later stages.
On the other hand, well-planned and spaced ML activities intend to
reduce ECL by creating a more concise and organized instructional
design so that learners do not have to utilize their working memory capacity
to impose “an organizational structure” (Sweller, 2007, p. 374) of
the novel information during the cognitive processes. At this point, it is
important to note that this logically organized instructional design is
consistent with the modern definitions of ML (see Section 1). Owing to
all three forms of CL are additive (van Merri€enboer & Sweller, 2005), the
reduction of ICL and ECL will allot adequate working memory capacity
to allow GCL to function at optimum level to facilitate schema acquisition
and automation in long-term memory.
3.2. Cognitive theory of multimedia learning
The use of technology in ML leads to the concept of “multimedia
learning” (Mayer, 2014b, p. 2). Despite Mayer’s concept of multimedia
learning may not involve technology, but in this paper multimedia represents
a “potentially powerful learning technology” (p. 13) considering the
omnipresence of technology today and its attendant benefits across a
wide range of domains. In addition, this paper intends to extend the
multimedia concept to more advanced media including digital mobile
devices as advocated by Hug (2010). According to Butcher (2014)
“opportunities to embed multimedia content in broader contexts and to
develop personalized multimedia content will push the boundaries of the
multimedia principle” (p. 174) as technology keeps evolving and advancing.
Nonetheless, we concur with Mayer’s (2014b) focus on “learner-centered
approaches” (p. 15) to foster meaningful L2 learning.
CTML posits that “people learn more deeply from words and pictures
than from words alone” (Mayer, 2014a, p. 43). This statement is also the
underlying principle of CTML, namely multimedia principle, which
describes how people attempt to actively construct meaningful connections
by selecting relevant words (spoken or printed) and pictures (static
or dynamic) via the auditory and visual channels in working memory,
organizing the selected words into a verbal mental model while the
selected pictures into a pictorial mental model, and finally integrating
the two mental models with existing knowledge from long term memory
into a coherent mental representation.
There are three key assumptions that underlie CTML. Firstly, dualchannel
assumption explains how multimedia information first enters
learners’ processing systems. In general, learners process pictorial information
in the visual channel and verbal information in the auditory
channel. As the information will only be held for a very short time in
these channels due to the high decay rate in working memory, presentation
of on-screen text and pictorial information in the visual channel
alone is believed to overload learners’ cognitive capacity due to splitattention
effect (Moreno & Mayer, 2007; Sweller, 2007). Hence, it is
logical to infer that multimedia microcontent that employs graphics and
narrations simultaneously via technology will enable learners to effectively
utilize the working memory capacity for integrative processing of
verbal and pictorial information (Mayer, 2014a; Schnotz, Fries, & Horz,
2009). This assumption also justifies the multimedia principle. Besides,
advances in technology today afford multiple ways to creatively present
the L2 content, for example, mini grammar lessons, thus allowing learners
to attend to the fleeting information more successfully.
Secondly, limited-capacity assumption points out that working memory
can only hold a limited number of elements at any one time, hence
microcontent is most consistent with the concept of limited capacity and
durability as detailed in Section 2.1. During the ML activities, learners
are provided with repeated opportunities to actively process (select,
organize, integrate) the microcontent in order to foster meaningful learning
(Mayer, 2017; Sorden, 2012). This constitutes the third assumption,
active-processing assumption, and it resembles an example of constructivist
epistemologies in the scope of SLA. The application of these
assumptions in ML activities is believed to help language learners in several
ways: accelerating the selection of relevant visual and auditory information
based on their domain-specific knowledge schemas and previous
learning experiences, activating different language learning strategies to
effectively organize the information, and integrating the organized information
with their prior knowledge into a new coherent mental representation.
In this regard, Mayer (2017) offers three instructional goals to
facilitate the three cognitive processes: Reducing extraneous processing,
managing essential processing and fostering generative processing.
3.3. Self-determination theory of motivation
Mayer’s (2014b) learner-centered approaches in CTML, Sweller’s (2007)
suggestion to include motivational variables in CLT and Bodnar,
Cucchiarini, Strik, and Van Hout (2016) motivational approach to computer-
assisted language learning lead the researchers to draw on a third
theory: SDT, to further support the concept of ML. The aforementioned
instructional and technical considerations of ML will become meaningless
without motivation because motivation determines “the direction,
intensity and persistence of behavior” (Schnotz et al., 2009, p. 70).
Deci and Ryan (2008) theorize an individual’s motivation as self-determined
and broadly classify it into three orientations: Autonomous motivation,
controlled motivation, and amotivation. Autonomously motivated
individuals experience “volition, or a self-endorsement of their actions”
(p. 182) while individuals with controlled motivation engage in an action
because of external contingencies. Amotivation implies a lack of intention
and motivation. Similarly, motivation can also be classified into
intrinsic and extrinsic motivation. With reference to Ryan and Deci
(2002), intrinsic motivation refers to a self-motivated state where an
individual does something for its inherent satisfactions, whereas extrinsic
motivation refers to doing something in order to attain external instrumental
values such as a reward. It is generally agreed that intrinsic
motivation is a classic example of self-determination and it constitutes
an important basis for learning. However, not everyone finds learning
inherently interesting and enjoyable. Therefore, some people need other
incentives or reasons to learn: The extrinsic motivation. According to
Ryan and Deci (2000), SDT postulates four types of extrinsic motivation
(external regulation, introjected regulation, identified regulation and integrated
regulation) that are spread along a continuum of relative autonomy.
All these motivational orientations are “predictors of performance,
relational, and well-being outcomes” (Deci & Ryan, 2008, p. 182) and are
called Organismic Integration Theory.
Another sub-theory of SDT, Cognitive Evaluation Theory, proposes
that the degree to which all types of motivation are enhanced or diminished
is conditioned by the level of satisfaction of three innate psychological
needs: Autonomy, competence, and relatedness (Deci & Ryan,
2000). The need for autonomy relates to the volition to self-initiate and
self-endorse one’s own actions; the need for competence relates to the
propensity to feel effective in attaining valued outcomes; and the need
for relatedness relates to the desire to feel connected to important others.
Research across domains has confirmed that the higher the satisfaction
of these psychological needs, the more autonomous the motivational orientations,
thus yielding greater performance or positive outcomes among
learners (e.g., Chen & Jang, 2010; Deci & Ryan, 2008; Niemiec & Ryan,
2009; Nikou & Economides, 2018). In relation to L2 instruction, ML
attempts to satisfy learners’ need for autonomy by affording an adaptive
and autonomous L2 learning environment where learners decide what,
when, where, and how to engage in or repeat certain ML activities.
Besides, optimally challenging L2 tasks with well-organized scaffolding
and timely feedback via micromedia seek to satisfy learners’ need for
competence. Although ML is favorable for individual learning (e.g., selfpracticing
specially designed L2 pronunciation accuracy), it can also be
operated interactively in groups (e.g., via social media) respecting L2
learners’ emotional needs and interests, hence satisfying their need for
relatedness. Depending on the learning objectives and contexts, ML helps
satisfy the need for autonomy, competence, and relatedness, hence motivating
learners to autonomously engage in their L2 learning.
4. Theoretical model of micro-learning
After establishing linear relationships between proposed theories and
ML, a theoretical model encompassing the dynamic interplay of cognitive
and motivational theories within the concept of ML is devised to offer a
more principled way to integrate ML into L2 instruction across diverse
languages and learning contexts as illustrated in Figure 1. To our knowledge,
the causal relationships among these theories have not been explicitly
introduced in the literature.
In this model, CTML focuses on creating micromedia environment
using various technology affordances that mainly adheres to the multimedia
principle, while CLT centers on designing the architecture of
digital microcontent, where ICL is purposefully optimized to reduce the
element interactivity, hence lower the overall complexity of learning
material for a given L2 learning goal. These two cognitive theories
explain how ML responds to “two major problems concerning learning”
(Abel, Moulin, & Lenne, 2006, p. 275): (1) the overflow and complexity
of information by providing series of structured microcontent, and (2)
the time and place to learn by offering conducive micromedia environment
that adapts to learners’ profile and agenda. Meanwhile, SDT in this
model considers the question on how CTML and CLT manage to engage
learners to gain motivation to begin and continue to invest effort for
subsequent microknowledge construction. It is important to note that L2
instruction does not occur in isolation, but rather within different sociocultural
contexts. According to Reeve, Ryan, and Deci (2018), SDT also
seeks to elucidate how sociocultural conditions can support or thwart
human engagement via the satisfaction of the basic needs. From the
sociocultural perspective, CLT and CTML of this model would act as
cultural artifacts to mediate L2 learning (Lantolf, 2000, 2006). Along
Figure 1. Theoretical and structural model of ML for L2 instruction.
with the knowledgeable others (teachers and peers) involved, these are
expected to satisfy the basic needs of SDT, and ultimately lead to an
effective ML. The unique interplay of these three theories in L2 instruction
will be discussed next.
According to CTML, dual-channel-based ML is considered to have the
ability to influence learners’ motivation via the satisfaction of the three
innate psychological needs of SDT. For instance, ML which simultaneously
activates both the visual and auditory channels facilitates integrative
processing and subsequently prompts learners to process the verbal
and pictorial information more effectively. The success in this information
processing is expected to make learners feel more confident and
competent when they eventually meet the learning challenges. This, in
turn, triggers their willingness to continue investing more effort to
engage in appropriate active processing of CTML. In other words, active
processing may not occur without motivation even if the cognitive
resources are made available (Moreno & Mayer, 2007). This argument is
particularly true for novice language learners as images (besides narrations)
may serve as a scaffolding that helps accelerate meaningful connections
between new L2 knowledge and prior L1 (first language)
knowledge to construct the desired coherent mental representation in
working memory. This is because novice learners lack well-automated
schemas to direct the selection and organization of relevant vocabulary
in early L2 learning. As for more advanced learners, Schnotz et al.
(2009) argue that the use of images may increase extraneous CL due to
“redundancy effect” (Sweller, 2007, p. 376). This argument also corresponds
with the findings of Jiang, Renandya, and Zhang (2017) which
highlight the “expertise reversal effect” (p. 738) in instructional material
design. However, we would maintain that images may serve as “seductive
details” (Schnotz et al., 2009, p. 82) or “multimodal annotations” (Boers,
Warren, Grimshaw, & Siyanova-Chanturia, 2017, p. 709) to promote and
sustain advanced learners’ attention to engage in more complicated L2
learning, like syntax and pragmatics.
At this juncture, it would be worthwhile to contemplate Moreno and
Mayer (2007) “Cognitive-Affective Theory of Learning with Media”
(CATLM), which is an expansion of CTML that integrates affective variables
into the theory. One of the seven assumptions of CATLM denotes
that “motivational factors mediate learning by increasing or decreasing
cognitive engagement” (p. 313). CATLM’s motivational factors are consistent
with the “Expectancy-Value Theory” (EVT) of motivation
(Wigfield & Cambria, 2010) which focuses on learners’ beliefs in expectancy
of success and task value. However, SDT is chosen to underpin the
theoretical model because it operates on a “higher level of generality
than those of EVTs” (Savolainen, 2018, p. 137) based on the operational
characteristics of ML. One could therefore logically infer that SDT, a
broad theory of human motivation, can adequately justify why people in
all cultures do what they do and how they continue to invest effort in
the action (Ryan & Deci, 2019).
Within the scope of CLT, ML which reduces ICL and ECL is expected
to help internalize learners’ extrinsic motivation by satisfying the need
for autonomy, competence, and relatedness. Nonetheless, impetuous
reduction of these CLs does not necessarily equate to an increase in
GCL. According to Schnotz et al. (2009), freed-up working memory capacity
can only be used for GCL to a limited extent. Moreover, learners
are generally opportunistic and do not spontaneously engage in higher
order cognitive processes. Hence, we suggest a judicious manipulation of
ICL and ECL in exchange for learner’s motivation because motivated
learners will instinctively apply more effective learning strategies even in
the face of concurrent distraction, which will result in higher quality of
GCL within the learning conditions for schema acquisition. Besides the
dual-channel assumption mentioned above, this can be achieved by using
some working memory capacity for purposeful ECL, for example, integrating
some seductive details or SLA principles into microcontent development.
As an illustration, Yu, Zhu, Yang, and Chen (2019) mobile
language learning platform that incorporates seductive details like mobile
ubiquity and student needs in the content development is proven to promote
student satisfaction and learning outcomes. In terms of SLA principles,
we argue that well-organized instructions that not only apply CLT’s
effects and CTML’s principles but also give the right kind of control to
students (Ellis, 1999) may satisfy the need for autonomy. Besides, integrating
Krashen’s (1982) comprehensible input into ML design may satisfy
the need for competence and scaffolding Vygotsky’s (1978) “Zone of
Proximal Development” (ZPD) in ML instructions may satisfy the need
for relatedness. For instance, Li, Cummins, and Deng (2017) reveal that
their comprehensible texting-based intervention which affords definitions
of target words and sample sentences supports English language learners’
“incremental vocabulary growth” (p. 830) and Rassaei (2019) foregrounds
that incorporating dynamic feedback which is tailored to individuals’
ZPD maximizes “learner engagement and collaborative performance” (p.
604) during L2 development sessions. In any case, the essential learning
content must align with learners’ level of expertise, age, needs, cultural
background, learning objectives and second-foreign language contexts for
effective ML to take place.
These correlations among CTML, CLT, and SDT are consistent with
Bikowski and Casal (2018) Framework for Learning with Digital
Resources which expounds the inter-relationship among learner (who),
content (what) and technology (how). Besides establishing the interrelationships
between cognitive and motivational theories, this CTML-CLTSDT
model helps justify the practicality of ML as a successful learning
approach in prior research in diverse domains. Consequently, this model
has responded to the calls for proper theoretical framework in educational
technology (e.g., Ballance, 2012; Graham, 2011) by proposing an
amalgam of three theories that offers an appropriate theoretical support
for future empirical research on ML not only in the field of L2 instruction
but may be across different disciplines.
5. Application of the theoretical model
In this section, three published mobile ML examples are discussed to
provide a practical way to understand the application of the proposed
theoretical model of ML for L2 instruction. Concur with Ottoson’s
(1995) definition of application, “to put a thing into practical contact
with another” (p. 22), the application of this model is a purposeful process
that put the theories (model) into practical contact with the practices
(ML) in different L2 learning contexts. In other words, it is not
limited to descriptively analyze previous or actual ML apps (which refer
to pieces of software designed for a particular purpose) using this model,
applying it as a foundation to create future MLs is strongly encouraged.
Duolingo is “the most popular [app] in the category of Education in
Google Play” (Nushi & Eqbali, 2017, p. 95) with “over 120 million users”
(Crowther, Kim, & Loewen, 2017, p. 21) and more than 300 million to
date (www.duolingo.com/press). The success of this free mobile L2ML
can be elucidated in relation to the theoretical model. In terms of ICL,
Duolingo provides learners four different daily goals ranged from five
(casual) to twenty minutes (insane) vocabulary lesson. These bite-sized
learning snippets not only go well with millennials’ learning preferences,
but also adhere to the concept of limited capacity of working memory in
the lens of CLT. With reference to Sweller (1994), L2 vocabulary learning
is a typical example of low element interactivity, hence possesses low
ICL. In this regard, Duolingo judiciously manipulates the ICL by optimizing
the task complexity of each lesson based on the selected goals.
This resonates with the segmenting principle of CTML whereby multimedia
lessons are available in small user-paced segments. Similarly,
another two L2 MLs are also highly consistent with the manipulation of
the ICL. QuickLearn (Dingler et al., 2017) offers only three new words
per set of informal open ML per day for public (via Android) to learn
L2 vocabulary while Baldauf’s et al. (2017) gamified blended ML
prototype presents five words per activity (quiz or challenge) tailored for
Austrian secondary school students to learn English in a real-world context.
This ICL features clearly underline the conceptual characteristics
of ML.
With respect to ECL, Duolingo adopts simple and salient design with
quality animations, graphics and texts, user-friendly interfaces with easyto-
follow navigation procedure and systematic manipulation of input
that cater to a wide range of users of different ages and cultures. This
instructional design is found to support CTML core principle, the multimedia
principle (most visual input has audio representation), and additional
principles including coherent principle (simple design without
redundant information), signaling principle (enhanced color manipulation
and typographical effects), spatial contiguity principle (on-screen
texts appear right below the images) and temporal contiguity principle
(narrations and learning elements can appear concurrently). Unlike
Duolingo, QuickLearn’s and Baldauf’s designs do not fully capitalize on
CTML principles to control ECL. For example, QuickLearn claims to
provide well-received display modes and interaction modalities including
push notifications as session triggers to promote L2 vocabulary learning.
Despite numerous cutting-edge technology affordances, QuickLearn uses
text-only display modalities without any auditory support to learn
L2 vocabulary.
The purposeful controls of ICL and ECL in these MLs, as according to
the model, are expected to help internalize learners’ extrinsic motivation
because L2 learners may not readily use the freed up working memory
capacity for GCL (corresponding to active-processing of CTML). To promote
learner engagement in active cognitive processes, there are other
cognitive-technical affordances in these MLs which show the dynamic
interplay of CTML-CLT-SDT. According to Nushi and Eqbali (2017) and
Crowther et al. (2017), Duolingo provides choices for L2 learners
throughout the learning experience from setting a daily goal, choosing
relevant lessons and micromedia to deciding the best learning time, level
and pace. Likewise, QuickLearn and Baldauf’s prototype render similar
controls for L2 learners to self-initiate their learning (the need for autonomy).
To satisfy the need for competence, all three L2 MLs provide optimally
challenging tasks based on learners’ expertise level, scaffolding like
hypertexts, tips and notes, grammatical supports and slow-paced aural
repetitions, automatic metalinguistic feedback and achievement rewards.
For instance, Nushi and Eqbali (2017) find that “Duolingo points out the
mistake and repeats the question at the end of the exercises until the
progress bar is completed” (p. 92) and QuickLearn affords users the feeling
of completion and accomplishment in minimum time. To satisfy the
need for relatedness, Duolingo enables friends adding, competition with
peers and online discussions to promote social interaction and collaboration
while Baldauf’s prototype facilitates class dynamics and social interactions
among Austrian learners of English.
In sum, the satisfaction of the need for autonomy, competence, and
relatedness (SDT), be it via the controls of ICL and ECL (CLT) or leveraging
from distinct technology affordances (CTML), is expected to
motivate L2 learners to be more autonomously engaged in cognitive
processing that eventually contributes to schema acquisition. This is evident
in Duolingo as Nushi and Eqbali discover that “even with 5 to
10 minutes of daily practice, Duolingo helps learners feel they have
accomplished something, a feeling that keeps them motivated” (p. 95).
While the ML examples have been well received by the L2 learners and
positively related, to a certain extent, to the theoretical model under different
conditions, it is not a one-size-fits-all approach considering the
complexity of SLA. In this regard, Garcıa Botero, Questier, and Zhu
(2019) report difficulty in sustaining self-directed L2 learning with
Duolingo among university students in an out-of-class context. Besides,
Rachels and Rockinson-Szapkiw (2018) claim that there is no significant
difference between Duolingo and face-to-face L2 instruction. Similarly,
Dingler et al. (2017) find no significance difference between both textonly
(flashcards and multiple choice) display modalities. Two possible
explanations for such a result in these studies may be due to (1) the misalignment
between the MLs and learners’ level of expertise, age, needs,
cultural background or learning goals, and (2) the discordance between
MLs and the proposed model.
6. Implications for research and practice
This paper proposes a theoretical model for ML which could positively
impact L2 instruction across domains. Overall, it highlights the caution
against focusing too narrowly on the pervasive power of state-of-the-art
technology and redirects the attention to L2 and practice-relevant theories
that will ultimately inform a new concept like ML and open the possibility
for more theoretical rigor in future research. By explicitly
engaging ML with relevant theories, language planners, researchers and
practitioners would be able to link “abstract categories and connections
underlying familiar phenomena” (Widdowson, 1990, p. 56) in order to
facilitate meaningful L2 teaching and learning.
In terms of implications for research, the proposed theoretical model
helps make sense of cognitive and psychological phenomena resulting
from ML and offers a ready-made framework for L2 researchers and
practitioners to directly investigate this technology-mediated pedagogical
approach simultaneously from the CTML, CLT and SDT perspectives.
Since multiple perspectives are mostly needed to account for the complexity
of L2 learning, researchers can underlay any intended ML with
this tripartite model not only to obtain a more holistic and complete
view of the learning approach, but also to examine which dimensions of
the ML better support learning in a particular context. In this regard,
“Structural Equation Modeling” (SEM) can be used to estimate the structural
relationships in the model. Moreover, the model also helps avoid
confusion regarding which theories to refer to as there are many overlapping
elements among various learning theories and research using this
model may shed light on how ML may support, complement, and enrich
SLA theories. We hope that this model will inspire and guide productive
lines of L2 inquiry in different learning contexts to offer enriching views
on theory-practice relationships including the basis to develop instruments
to measure the effects of ML. Finally, we concur with the view of
Curcic, Wolbers, Juzwik, and Pu (2012) that “explicitly articulated theoretical
groundings help in communicating research and findings to a
diverse field of professionals” (p. 828), and hence it is our intention to
encourage more realistic and contextual studies to support or refute this
untested model not only in L2 domain but also across various
With regards to implications for practice, the theoretical model can
help L2 teachers understand which internal learning conditions are
affected in ML, hence encouraging them to creatively design their
learner-focused micro-instructions. For example, important information
which receives little or no attention in the initial phases of learning is
likely to decay swiftly, therefore teachers may provide pertinent micromedia
support (e.g., acoustic, visual or textual modes) in directing students’
attention to this information in profitable short episodes (CTML
perspective). Any cognitive overload can be guarded by limiting the
number of new variables at any moment throughout the learning process
especially among novice learners to foster schema acquisition (CLT perspective).
Adequate comprehensible microcontent, scaffolding, recurring
rehearsal, and timely feedback via the use of micromedia may alleviate
the possible affective inhibitions such as anxiety and motivate learners
toward continued progress in learning (SDT perspective). The said problems
and related forms of instruction are not uncommon in L2 classrooms.
Therefore, by employing the model, on one hand, it allows L2
teachers to reflect on their teaching practices simultaneously from different
theoretical perspectives of language learning, and on the other hand,
it urges them to be eclectic in making informed choices of any teaching
approaches like ML to foster principled pragmatism (Kumaravadivelu,
1994). Moreover, teachers may be curious about which variable of ML
best predicts which desired learning outcome (e.g., language skills, lexis,
grammar, vocabulary, pronunciation, autonomy, and motivation) among
their students, or are there any other variables which may appear to
interfere the ML processes in different learning contexts?
7. Conclusion
The main aim of this paper is to propose a theoretical foundation for
the emerging concept of ML which has been highlighted as a successful
learning approach in different aspects of learning, didactics, and education.
It is also intended to move beyond a superficial understanding of
ML to avoid overpromising on its potential first by comprehending the
criticisms of the notion. Subsequently, building on and expanding previous
efforts to provide a better understanding of ML and its current
state-of-the-art in L2 instruction. In response to these efforts, a brief
background on ML across different disciplines allows us to critically
assess the common benefits and pitfalls derived from the adoption of
ML in general teaching and learning enterprise. Based on these premises,
three established theories: CLT, CTML and SDT, are explicitly suggested,
discussed and related to ML based on a careful examination of its conceptual
characteristics and empirical observations.
In the words of Deutschmann and Vu (2015): “in the triangle of theory,
research, and practice of a domain of inquiry, it is theory that serves
as the underpinning for the other two” (p. 44). Therefore, this theoretical
proposition has provided deeper insights into the nuanced dynamics of
ML not only recognizing the significance of the theory-practice relationship
in moving the ML research forward, but also justifying the practical
use of ML in learning in general and in SLA in particular. Gleaning
from these insights, a theoretical model of ML was devised to offer a
principled way to adopt ML fundamentally in L2 education, and at the
same time, help address the four criticisms that have surfaced in a more
systematic manner. Firstly, a more structured operational definition of
ML has been adopted after an extensive reference of the modern conceptualization
of ML. Secondly, the model foregrounds that a true ML does
not simply break down the learning content into smaller doses, but with
effortful deliberation. Thirdly, ML is not an end in itself, but a means
for creating macro-level knowledge which is evidently more complex.
Lastly, principled pragmatism fostered by the model may reduce the possible
overreliance and misuse of ML by teachers and learners. As a result,
it can be suggested that L2 teachers and researchers should
simultaneously consider the cognitive and motivational domains of ML
which can be further compartmentalized into technical, instructional
design and learner motivational aspects in designing any future MLs in
order to achieve the effectiveness anticipated.
Though the application of theories to support ML in L2 instruction is
evident, it is important to note that the proposed model does not address
the full range of L2 learning phenomena and the interplay of the ML
variables has yet been confirmed empirically, hence it is open to substantiation.
As a final remark, learning does not occur simply because content
is made available in a smaller chunk than it previously was. ML
alone may be a bad idea as Shackleton-Jones (2016) claims, however, we
would maintain that a theory-based ML could and should be treated as
another viable pedagogical tool, but never as a complete replacement of
formal L2 instruction. Moving beyond a one-size-fits-all approach, more
robust research and alternative views are invited to provide solid empirical
proofs whether ML can serve as a venue for reliable, practical, and
meaningful learning approach and stand the test of time in distinctive
conditions, domains, and levels of L2 instruction.
Disclosure statement
No potential conflict of interest was reported by the authors.
Notes on contributors
Hou-Keat Khong is a PhD candidate in the School of Educational Studies at the
Universiti Sains Malaysia. His research interests focus on computer-assisted language
learning including pedagogical innovations, motivation and strategies.
Muhammad Kamarul Kabilan’s research interests include the use of technology and
social media for English language education, professional development and critical practices
in English language teacher education. Currently, he serves as board member in
the British Journal of Educational Technology.