What Research is Needed about Gestalt Language Processing and Natural Language Acquisition?

Proponents and critics of NLA and GLP generally agree that more research is needed, as many of the foundational claims behind GLP and NLA are not supported by research. Here, I outline some ideas of what research is needed, and roughly in what order. All of these study types should be eligible for publication in a peer-reviewed journal.

If you are interested in conducting such a study and don't know where to begin, please reach out! I am happy to provide support in study design, execution, data analysis, and writing at no charge if it helps get studies on NLA and GLP out there.

To my knowledge, there has only been one study of NLA that had ethics oversight and an outcome measure, and its results have not been published or shared. I am not aware of any additional ongoing research projects investigating NLA, regardless of what is claimed on social media. If I am incorrect, please contact me at support@gestaltlanguageprocessing.org and I will update this page.

The dissemination of information about NLA and GLP has led to many parents learning meaningful ways to support their children. It has probably improved the therapy practices of many SLPs. At the same time, many SLPs and parents who find benefit to NLA/GLP were previously receiving outdated training or advice; additionally, the explanations for why NLA supports language learning are disconnected from the science. Research on NLA/GLP should begin with an up-to-date and in-depth understanding of the current state of research on language acquisition, language processing, research methods, and speech-language therapy. NLA and GLP respond to real gaps in our understanding of language development as of the early 1980s, and correct real problems with outdated therapy practices such as trying to extinguish echolalia. However, extensive research throughout my lifetime has addressed many of those gaps, and of course, uncovered new gaps in our understanding. While this disconnect from the research does not inherently make claims about NLA untrue, it means they often misrepresent the state of current research and evidence-based practice. Because research should build on existing knowledge, and fill gaps in our current knowledge, it is crucial to consider what is already known before planning additional studies. This requires acknowledging that there is an entire field dedicated to the study of language processing (psycholinguistics).
We must remind ourselves that the purpose of research is not to find evidence in support of a particular idea. Research should start with understanding what is currently known, identify gaps in that knowledge, and seek to fill those gaps using the method(s) that are most appropriate for a specific research question. Wanting the best for our clients means remaining open to rejecting ideas we currently hold if we come across something that will serve our clients better.
Before research can be conducted on Gestalt Language Processors, or children who appear to be developing language in a gestalt way, we need a definition of GLP. This is important for identifying inclusion criteria for kids in a study, and for knowing which children are appropriate for therapy using the NLA protocol. Clear definitions are important so that if a parent learns about a certain finding that relates to GLPs, they know whether to expect that finding to apply to their child.
As it currently stands, there is no standardized definition or set of criteria for a GLP. The speech-language pathologist who coined the term recently provided the following response when asked to define GLP in a podcast interview: “my definition is that it is whatever it is for an individual who’s doing this.“ In a personal conversation, she told me, “I’m not good with the idea that we can even have definitions“. She and other proponents of NLA have reiterated a refusal to provide a definition in the comments of this Instagram post. A clinical label is not useful without a definition.

Currently, many blogs and other sources list signs a child might be a GLP, often without a definition. Other sources define GLP simply as a child who uses delayed echolalia. While many of us have a "I know it when I see it" approach to identifying GLPs in the early stages (and even I, who have been called "some gestalt curmudgeon," know what you're talking about), advancing the research requires a definition and clear criteria for recognizing GLPs.
Validity of a definition or diagnosis deals with showing that we are labeling the concept that is most useful, and we have a good reason for doing it. Questioning the validity of a definition is not the same thing as questioning the validity of the people or experiences of people who are described by that definition. For the case of the GLP label, we would need to show that individuals called GLPs do process language in a chunkier way than individuals who are not called GLPs. This might require us to familiarize ourselves with the methods of language processing, and conduct language processing experiments. Alternatively, this process might lead us to choose a different word that doesn't imply claims about language processing (there is literally a whole field of study called psycholinguistics that deals with this). We would also want to show that being labeled as a GLP is useful for planning therapy and educational programming- what kind of evidence do we need to show that GLPs benefit from NLA and not from other techniques, meanwhile ALPs benefit more from those other techniques than from NLA. If we cannot show that identifying a gestalt language processing style is useful for clinical decision-making, then we need to ask why SLPs are providing this label for children.
Showing the validity of the label GLP also requires showing that the characteristics of GLPs are actually more common in people who are GLPs than those who are not. This requires connecting with the research on typical language development. While there is great value to studying autistic language in its own right (i.e., without comparison to neurotypical norms), if we want to make claims like "X is a characteristic of GLPs," we should first make sure that X is actually more common in GLPs than in non-GLPs.
All that said, many autistic adults have embraced the term gestalt language processing as an expression of autistic identity around echolalia use. I think that's great! I don't want to take anyone's identity away. And if speech therapists find that certain clients learn new skills better when presented in longer phrases vs single words, making clinical decisions for that client based on that observation is part of evidence-based practice. Those are not the same thing as offering a clinical label and making decisions based on the label.
Once we have a definition of GLP, and clear criteria for identifying GLPs, we need to show that GLPs can be identified reliably.
Inter-rater reliability: This involves having two clinicians assess the same set of children to identify whether they are GLPs. Then we calculate how often the two clinicians agree with each other. Studies of inter-rater reliability need to consider many factors. For example, what kind of training did the clinicians have? Showing that two SLPs whose practices are devoted to GLPs and who often work closely together make the same judgements is a great first step, but we also need data on agreement between SLPs with more typical amounts of training/experience. It also matters what kids are in the reliability sample. For example, we might expect high inter-rater reliability for early communicators whose verbal output is all long quotes from media. However, what is inter-rater reliability for nonspeaking children? Children who appear to produce a lot of self-generated language? Inter-rater reliability is important for any clinical label. Since clinical labels are used for treatment planning, it is not helpful to have a label that is not consistently applied. Low reliability can mean we need to go back to earlier steps and revise the identification criteria, or it could mean that we are trying to apply a label that isn't as necessary as we thought.
Test-retest reliability: This isn't actually about testing. You don't have to test the kids on your caseload. It's just the name of the kind of reliability calculation that deals with the stability of a label over a relatively short period of time. If a child is identified as a GLP today, will they also be identified as a GLP in two months?
s with identifying GLPs, we need data that language samples can be reliably scored using the six stages of NLA. Ideally, the results would be submitted for peer review and the de-identified data set would be made available. This kind of study absolutely can be published in a peer-reviewed journal.
Inter-rater reliability: If two clinicians score the same language sample, how often do their scores agree? We could calculate this multiple ways: we could go utterance by utterance, calculating on what percent of the utterances their scoring agrees. We could also calculate it at the level of the language sample: what stage did the clinician put the child in? In such a study, we would want to make sure both clinicians have access to the same information, and study designers would need to think carefully about what kind of information that should be. For example, is a written transcript and a few paragraphs of language history adequate? Or would they both need to watch the same video of the session? What follow-up questions can clinicians in the study ask? Study designers would also need to consider how to make sure that language samples used for the reliability study are representative of the kinds of language samples that get obtained in real clinical scenarios. Finally, study designers would need to consider what kind of training SLPs should have before scoring language samples in this study.
Test-retest reliability: Again, this isn't so much about testing. It's just the term for the kind of reliability that measures stability over time and settings. If a child's language sample is scored as stage 4 this week, and then stage 2 next week, and back to stage 4 the week after that, can we really say that the child is "in" one of those stages? Of course, we hope for changes over time as children progress, but that would be a monthly progression like 2-2-2-2-2-2-2-2-3-3-4-4-4-4-4. If we see month-to-month variation like 2-4-3-4-4-2-2-3-4-5-2-3-2, that could tell us that either we don't have a reliable way of identifying stages, or that the stages aren't the best way to measure language growth and plan treatment.
Why this matters: Because decisions of what to model are based on identifying a child's stage of NLA, it's really important to show that the stage of NLA can be reliably identified. If Clinician A considers a child to be at stage 2 and Clinician B identifies that same child as stage 4, they will then come up with different treatment plans. If modeling at the child's current stage is truly necessary for progress, then a child whose stage is misidentified will not be expected to benefit as much from therapy.
To justify claims that there is a spectrum of processing styles from more analytic to more gestalt, we need studies that use methods of studying language processing. We would need to come up with an experiment where we have reason to believe that a gestalt-y processing style should lead to a particular result, and an analytic processing style would lead to a different result. Then we would need to conduct that experiment, and then show not only that there is a spectrum (or categories) of processing styles, but that those processing styles correspond to the clinical observations associated with ALP and GLP labels.
On average, there absolutely are differences in language use and experiences between autistic individuals and allistic or neurotypical individuals. To date, studies of sentence processing comparing NT and autistic individuals have not found differences in sentence processing between the groups. Differences in other cognitive processes have been suggested as the explanation for linguistic characteristics of autistic individuals.
Treament fidelity asks, how do we know a therapy session is actually following the NLA Protocol? This is important because if we want to study the effectiveness of the NLA protocol, we need a way of knowing that the kids getting NLA therapy are actually getting NLA therapy, and the kids in the control group are not. Similarly, if there is evidence that NLA is effective, therapists need a way to know that they are implementing it in the way that has been shown to be effective. Parents should also have access to a way to know that their NLA-trained therapist is actually doing NLA.
Fidelity guidelines often consist of a checklist of characteristics that should be present (or not present) in a session for it to "count." A great deal of thought must go into fidelity checklists so that they capture the essence of a therapy while allowing for the therapist to exercise flexibility and clinical judgment. An item such as "therapist models gestalts while playing with a farm toy" would be ridiculously restrictive, while "therapist models language" is probably too vague. Better checklist items could be, "between 40% and 60% of the therapist's models are at the child's current NLA stage" and "The therapist models language relating to the focus of the child's attention."
Once definitions and treatment fidelity have been ironed out, we're ready to start documenting actual therapy and language growth. I know that the groundwork feels boring and it's tempting to skip steps, but we owe it to our kids and our communities to conduct research with the care it deserves. Case studies are a good first step for publications because they have one single participant and often do not require IRB review. Case studies are typically retrospective and recount in great detail the clinical course of one particular patient. A good case study about the NLA Protocol would report analysis of all language samples conducted over the course of treatment. If making full language samples available is not possible, the the report should include at a minimum:
- the time of each sample relative to the start of treatment (e.g., first assessment, month 1, month 2, etc)
- Percent of utterances at each stage for each language sample
- The stage the child was determined to be in at each language sample
- A detailed description of the child's strengths and weaknesses at the start and end of treatment, and periodically during treatment
- A detailed description of treatment procedures: how often did the child attend sessions, in what setting, and for how long? Who was present for sessions? What kind of carryover was recommended and completed? What kinds of activities were used in sessions?
- A report of treatment fidelity and any procedures in place to calculate the reliability of language sample scoring.
Case studies would be important in several ways. For one, they show that it is possible to implement NLA scoring and the NLA protocol with reliability and fidelity. And because they are not a very strong type of evidence, they can motivate later studies that are stronger forms of evidence. It's unlikely to get funding for a clinical trial, or to attract participants to a clinical trial, if no previous work has been done to justify the time and expense.
Single Subject Design studies are often associated with applied behavior analysis, but that doesn't mean ABA is the only field that can do them! The idea behind an SSD is that the participant is their own control group. It's the ultimate in recognizing the uniqueness of an indvidual! While I don't normally recommend Wikipedia for research, I actually think the wikipedia page gives a decent and accurate overview of considerations for single subject research design. In a very basic sense, you would start by identifying an outcome measure that you can measure at each session. Maybe how many times the child spontaneously initiates multimodal communication. Then you identify your treatment and comparison: Maybe you decide to compare modeling as aligned with the NLA protocol against modeling and naturalistic prompting as aligned with Enhanced Milieu Teaching. You might alternate interventions by session while tracking data on your outcome measure. You would need to consult with someone knowledgeable about this type of design (not me!) to account for the ways sessions might influence each other and all kinds of other considerations.
I am aware of one dissertation that was written about NLA using single subject design; however, the university has not made the document public so I do not know what the findings were. If you have a copy of that dissertation or are the author and are willing to share it, please contact me at support@gestaltlanguageprocessing.org.
Marge Blanc's 2012 book was based on data gathered from 85 autistic children who were seen at her non-profit clinic, yet no summary data is reported in the book. A 2024 addendum published on her website states that "All 85 progressed from at least one stage to the next while they were in the clinic" and "the vast majority (more than 80%) progressed from Stage 1 or 2 to at least a solid Stage 4. Of those who developed grammar, all but the oldest client (Benjamin) reached Stage 5 (advanced grammar) or 6 (a full grammar system)." The book and subsequent addenda contain no mention of ethics oversight or informed consent.
The book does not report background information on the participants. The following information should be included in a longitudinal study and was not included in the book:
- Age at the start of therapy
- Diagnoses
- Stage of NLA at the start of therapy
- Gender, race, and SES; to monitor how representative the sample is of the population
- Description of any other therapies the children received
- Inclusion and exclusion criteria
- How many participants were invited to participate and chose not to; how many dropped out of the study.
The book also does not report details about the therapy received. The following information should be included in longitudinal observational studies of NLA:
- who provided the therapy - SLP? Two SLPs? grad students? a parent training model?
- What happened during sessions? What treatment fidelity monitoring procedures were in place? Therapy should be described in enough detail that an SLP could replicate it from the description.
- Data on the frequency, rate, and duration of sessions. This could vary by participant, but would need to be documented.
- Reasons for stopping therapy
- How often language samples were taken and analyzed. Ideally, this would follow a pre-registered plan such as analyzing a language sample on the first session of every month. If therapists are choosing which language samples to analyze based on their judgement of a session being representative, that would be a serious source of bias.
The primary outcome measure in the book seems to be the stage of a child's language development as determined by language sampling. However, without information on reliability of language sample scoring, it's hard to know what to make of that outcome measure. Additionally, the book and its addenda do not report summary data beyond what I quoted at the start of this section. Here are some things I'd be looking for in the reporting of results of a longitudinal study:
- A summary table that reports for each child:
  - age at the start of therapy
  - NLA stage at the start of therapy
  - How much time they spent in each stage of NLA
- Aggregate data modeling growth in the NLA stages over time. This would require some statistical skill and decision-making; SLPs are not trained in this kind of statistics and would need to collaborate with a data scientist (I volunteer!).
With appropriate IRB oversight and careful planning, a longitudinal observational study could be conducted using the data that clinicians around the world are already gathering or have already gathered. Ethical oversight is important because SLPs can't just mail their data to a researcher to include in a study: that would violate ethical principles of autonomy, whereby participants need to agree to participate in research. At the same time, there are established procedures and safeguards for this kind of retrospective chart review type of research, which is why consulting with someone knowledgeable of those procedures and including an IRB would be important.
I'll admit that I get frustrated by most conversations about NLA and clinical trials. I don't think we need a clinical trial before anyone can use NLA, but I do think that definitions, connection to existing literature, reliability, and a fidelity checklist are must-haves, alongside active efforts to conduct case studies, SSRDs, and observational studies. When critics insist on a clinical trial, they are rightly met with criticism that many other approaches SLPs use are not backed by clinical trials. And when proponents of NLA say "the critics are insisting on a clinical trial," they ignore the many other criticisms that have been raised against NLA. There are so many steps that need to happen before it's even reasonable to plan a clinical trial investigating NLA, and I think it is worth asking why those steps have not been taken. Many of these steps do not require funding or access to a research university, so issues like gatekeeping of research funds can't be the whole explanation.

What Research is Needed about Gestalt Language Processing and Natural Language Acquisition?

Literature Review

Development of Definitions

GLP Definition - Validity (Connection to Existing Literature)

GLP Definition - Reliability

NLA Stages - Reliability

Investigation of ALP/GLP Processing Continuum: Processing Studies

NLA Protocol: Fidelity

NLA Protocol: Case Study

NLA Protocol: Single Subject Design

NLA Protocol: Observational Study

NLA Protocol: Clinical Trial

Gestalt Language Processing