Review Analysis Methodology
Sellyze uses a multi-stage review intelligence system to identify statistically meaningful product weaknesses and market opportunities across Amazon, Etsy, and TikTok Shop.
Why typical seller research fails
Four-stage analysis pipeline
Detection
Stratified SamplingMarketplace reviews are typically 85-95% positive. Random sampling misses most problems. Sellyze collects hundreds of reviews per product, up to 2,000 on Pro plans, and applies stratified sampling. "Stratified" means the sample is split into layers by star rating instead of pulled at random. Lower-rated reviews are oversampled at a higher rate than their natural proportion, while positive reviews are capped. The oversampling rate is calculated dynamically based on the actual rating distribution of each product. This produces a balanced dataset with enough negative signal to detect even uncommon issues. A product with 5,000 reviews and a 4.3 average might have only 400 negative reviews. Stratified sampling ensures those 400 are well-represented in the analysis instead of being drowned out by the 4,600 positive ones.
Classification
AI-PoweredEach review is decomposed into individual claims using purpose-built AI models. A single review often contains multiple distinct complaints. "The lid leaks and it's hard to clean" becomes two separate claims. "Great product but the handle broke after a week and the color is not what I expected" becomes three. Claims are classified into six core categories: product_defect (something broke or failed), missing_feature (something buyers expected but the product lacks), usability_issue (the product works but is frustrating to use), preference_mismatch (subjective taste, like color or size), logistics_complaint (shipping, packaging, or fulfillment problems), and user_error (the buyer misused the product). Etsy and TikTok Shop analyses add platform-specific extensions such as sizing_fit, description_mismatch, seller_communication, and safety_concern. Claims are clustered into named pain points. Two claims are grouped if fixing one would fix the other. Models are configured for deterministic, reproducible results.
Prevalence
Statistical CorrectionBecause lower-rated reviews are oversampled, raw frequency counts overstate how common each problem is. Sellyze corrects for this using statistical weighting methods borrowed from survey research. Each review is assigned a weight based on its star rating and the actual rating distribution of the product. Mention rates are then calculated using these weights, producing prevalence estimates that reflect the true proportion of buyers affected. This step is what separates Sellyze from tools that just count keywords. Without prevalence correction, every pain point looks like a bigger deal than it actually is.
Validation
Cross-Product ReplicationIndividual product analysis has inherent sample-size limitations. The Opportunity Scanner scans up to 60 products in a category and runs deep AI analysis on the top competitors. This is where single-product findings get stress-tested. A pain point found independently in 4 out of 10 products is a category-level gap, not a single-product fluke. If buyers of different brands, at different price points, all complain about the same thing, that complaint represents a real market opportunity. A pain point that only appears in one product might just be a bad batch or a single manufacturer cutting corners. Cross-product replication is the gold standard of validation, the same principle used in scientific research.
How Sellyze handles noise
Not all negative reviews are equal
Most review analysis tools treat all negative reviews equally. A shipping delay complaint gets the same weight as a product breaking after two weeks. That is a problem. Sellyze separates logistics issues, user errors, and personal preferences from actual product defects and missing features.
This noise filtering changes the picture. A product with 25% negative reviews where most complaints are about late delivery is a very different opportunity than one where buyers say the handle snaps off. The first product might be fine. The delivery issues belong to the fulfillment provider, not the product itself. The second product has a real design flaw that a competitor could fix.
Only actionable complaints count toward the opportunity score. Logistics complaints, user errors, and pure preference mismatches are tracked and reported but do not inflate the pain point severity or the overall opportunity grade. This means the scores you see reflect problems you can actually solve with a better product.
Severity Calibration
Impact-aware classification
Each pain point receives a severity rating based on both frequency and impact type. Functional failures (leaks, breakage, safety issues) are held to stricter thresholds than cosmetic or preference issues, because a core function failure at any rate is more consequential than an aesthetic complaint. Severity is context-aware: a safety defect affecting 1% of buyers is rated higher than a cosmetic issue affecting 5%. The four severity levels use adaptive thresholds calibrated by issue category.
Core function failure. The product does not do what it is supposed to do. A water bottle handle that breaks off, a phone case that cracks on first drop, a blender that stops working after a week. Safety issues and functional failures trigger critical severity at lower frequency thresholds than cosmetic problems, because a core function failure at any rate is more consequential.
Degrades the primary use case without completely breaking it. A lid that does not seal properly, a backpack zipper that sticks, a cutting board that warps in the dishwasher. The product still works, but the experience is significantly worse than expected.
Noticeable but not deal-breaking. Color fades after a month, stitching looks uneven, the product is slightly smaller than the listing photos suggest. Cosmetic issues that buyers notice and mention but that do not affect core functionality.
Personal preference, not a defect. Wishes it came in more colors, thinks the logo is too prominent, would prefer a matte finish instead of glossy. These are subjective and usually not actionable as product improvements.
Opportunity Score
Products are evaluated across five dimensions. The final score is a weighted composite scaled to 0-100.
Are the complaints fixable and impactful?
Scores higher when complaints involve core function failures that are fixable with a design change. Cosmetic or preference issues score lower. A water bottle that leaks scores higher than one that buyers wish came in more colors.
Is the product selling? Growing or declining?
Based on sales volume, BSR trends, and review velocity. A product selling 1,000+ units per month with growing reviews scores high. A product with declining BSR and slowing review growth scores low.
How hard is it to win market share?
Fewer sellers, lower average review counts, and no dominant brand with 10,000+ reviews means less resistance to entry. A category where the top 10 sellers all have under 500 reviews is far easier to enter than one with an entrenched leader.
How easy is it to manufacture and launch?
Simple products that use standard materials and existing molds score high. Products requiring patents, certifications, or complex tooling score low. A silicone kitchen utensil is easy. A Bluetooth device with FCC certification is not.
Can you make money at this price point?
Based on the current selling price minus estimated landed cost and marketplace fees. Products under $15 rarely support healthy margins after advertising spend. The sweet spot is $20-50 with a 3-4x markup on manufacturing cost.
Known Limitations
Transparency builds trust
- Classifier accuracy has not been validated against human-labeled data. Precision and recall are unmeasured.
- Products with very few negative reviews may yield limited negative samples, reducing per-product prevalence reliability.
- Review date metadata may be unavailable for some non-English marketplaces due to format variability.
- Sales volume estimates are qualitative where exact ranking data is unavailable from the marketplace.
- Review analysis reflects what buyers write, not what all buyers experience. Satisfied customers leave reviews less often than dissatisfied ones. Prevalence weighting corrects for this, but some bias remains.
- Sellyze does not verify supplier feasibility. A product improvement may be technically possible but prohibitively expensive to manufacture. Always validate specs with your supplier before committing to production.
What Sellyze does not do
Clear scope, no false promises
- Sellyze does not estimate sales volume from BSR rank. It uses marketplace-reported data where available, such as Amazon's "bought in past month" count.
- Sellyze does not do keyword tracking or search volume analysis. It uses keywords for product discovery in the Opportunity Scanner, and generates search terms as part of listing optimization, but does not track keyword rankings over time.
- Sellyze does not manage PPC campaigns, inventory, or refunds. It is a research and optimization tool, not a seller management platform.
- Sellyze is not a sourcing marketplace. It generates supplier briefs, RFQ templates, compliance checklists, and packaging specs to help you negotiate with suppliers, but you still need to find and vet suppliers yourself.
See it in action
Run a free analysis on any product from Amazon, Etsy, or TikTok Shop.
This methodology applies to all Sellyze product analyses and Opportunity Scanner reports. For questions, contact support@sellyze.ai.