In recent years, creative AI generation—especially in image generation -- has rapidly shifted from being a "surprising new technology" to becoming a "practical tool that transforms business operations."
In March 2025, OpenAI integrated native image generation capabilities into GPT-4o, allowing users to interactively perform text→image→differential corrections within a single chat thread. What's remarkable about GPT's image generation is its ability to generate "text" with high precision, something that even Midjourney, previously considered cutting-edge, couldn't achieve. This breakthrough suggests that generative AI, once limited to serving as "material," might now be usable as "finished products."
The video domain has also achieved stunning quality improvements. Google DeepMind's Veo generates videos up to 60 seconds long from text input alone, handling complex camera work and depth-of-field effects. Subsequent updates enabled automatic sound effects and dialogue addition, dramatically accelerating the prototyping speed for short-form videos typically used in social media advertising. In terms of video quality, ByteDance's Seedance 1.0 can generate 10-second multi-scene videos quickly, with exceptional prompt-following accuracy and human realism that seems to surpass other models. For creative work that simply animates existing still images, Runway remains competitive.
These trends are accelerating domestic adoption. According to a survey by IDC Japan, the domestic generative AI services market reached 101.6 billion yen in 2024 (source: https://my.idc.com/getdoc.jsp?containerId=prJPJ52722724). We're increasingly seeing what appear to be AI-generated advertisements online. Given the quality of recent creative generation, there are likely cases we don't even notice. Major brands are also embracing this technology starting with Ito En's TV commercial featuring an AI model, Toyota Motor Corporation aired a fully AI-generated commercial, showing gradual AI adoption in domains once considered the exclusive territory of major advertising agencies and select video production professionals.
While this trend toward using generative AI in creative domains is strong, the practical reality still feels distant from true usability.
First, the technical limitations of generative models themselves remain significant. While diffusion models achieve high-resolution, photorealistic output, they still can't meet corporate demands for "perfect adherence to brand colors or specified fonts." Color codes provided as constraints still contain errors, leading to research on additional algorithms for correction (https://arxiv.org/abs/2404.06865). Text-containing images still suffer from character deformation and spelling errors at certain frequencies. Eliminating these inconsistencies completely requires more than just improving current models-manual verification in final stages remains essential. When considering the cost of generating a single creative piece, many cases still favor human execution for better cost-performance.
Second, commercial workflow barriers involve legal and quality control issues. QC (Quality Control) systems that automatically satisfy different submission requirements and legal/industry guidelines for each medium remain immature. TV commercials and web advertisements have various submission specifications for aspect ratios, bit rates, audio, etc. While services claiming "automatic compliance checking" are emerging, practical use still requires manual verification due to false positives and missed detections. In legal aspects like portrait rights and copyright, the creative generation field remains underdeveloped. While litigation risk for user companies may be low, reputational risk from accidentally resembling someone's copyrighted work is unavoidable.
Third, human insights required at the strategic layer are irreplaceable. Generative AI has dramatically accelerated work in areas like persona creation and market needs analysis. However, judging their plausibility requires trend awareness and consumer perspective. While AI can generate ideas, the aesthetic judgment to evaluate their quality still rests with human users.
As outlined above, three elements-model accuracy, legal/QC issues, and human knowledge are the main factors blocking generative AI's business application. Therefore, the current realistic optimal solution is "Human-in-the-Loop" humans provide direction and design guardrails, generative AI presents numerous options, and humans select and refine based on both data and intuition.
While significant barriers remain for fully automated creative completion, correctly identifying application points in workflows and strategically inserting generative AI can dramatically reduce production time and verification cycles.
First is static banner "material" generation, as mentioned earlier. While product image generation remains challenging, areas like human models holding products, background materials, and concept images have reached sufficient usability. For B2B advertisements that don't require extensive visual materials, the quality is more than adequate for finished products. Next is "copywriting." With proper communication design information provided upfront and prompt engineering to prevent generic copy, AI becomes an unparalleled supporter for ideation. While tone and overall layout still require human intervention, JAPAN AI is working with multiple advertising agencies on PoCs to address even these areas with generative AI, showing promising early results.
Another point to remember about video advertising is the overall scenario structure. In the past, especially for ads using long videos, story writing was a specialized field that only a select few writers with experience in TV shopping could handle. However, there are certain patterns for scenario structure, so once this pattern is decided, the video ad scenario can be completed simply by applying the characteristics of each product.
In this way, if a company can divide roles in such a way that "generative AI is responsible for the explosive expansion of speed and variation, and humans focus on selection and polishing," it is already proving effective in the field of commercial advertising. As long as the production team consciously designs human-in-the-loop, the AI will go beyond being an assistant and function as a turbocharger, speeding up the test-learning cycle by an order of magnitude. It can be said that it is worth trying to get this "explosive gain of partial automation" precisely because full automation is still a long way off.
Research into diffusion models with complete style compliance will advance, significantly reducing color code and font deviations. When layout-capable language models that simultaneously output text and coordinates emerge, text corruption issues in banner generation will move toward resolution. As guaranteed synthetic human models become widespread, subject rights risks will decrease. When QC models that comprehensively analyze images and text to automatically detect media specifications and prohibited words become practical, even pre-submission checking will enter automation's scope. Given current generative AI progress, these developments don't seem far off.
However, as AI-generated creative content that exploits human desires and behaviors floods the advertising landscape, generating deep insights into social behavior, steering brand strategy, and making final risk judgments will remain in human domain.
The current optimal solution is "Human-in-the-Loop." Humans first design briefs and guardrails, then have AI generate numerous options. They quickly identify winning strategies through metrics and have humans refine only the best options. By cycling through this process rapidly, you can maximize both AI's strength in mass production and human strength in insight.