Image-Based Product Search System Built on PostgreSQL + pgvector

フォーカスモード

フォントサイズ

最終更新日: 2026-04-08 15:06:52

This article introduces the AI implementation of an image-to-image product search system based on PostgreSQL + pgvector.
Contents
1. System Architecture Overview
2. First Stage: Environment Preparation and Extension Detection
3. Second Stage: Create Product Image Search Schema.
4. Third Stage: Insert Simulated Product and Image Data.
5. Fourth Stage: Create HNSW Vector Index and Auxiliary Index.
6. Fifth Stage: Image-to-Image Search Core Query Verification (7 Major Scenarios)
7. Sixth Stage: EXPLAIN ANALYZE Performance Verification
8. Seventh Stage: Storage Statistics and System Monitoring
1. System Architecture Overview
Technology selection
This solution builds an image-to-image product search system based on PostgreSQL 15.14 + pgvector 0.8.2, with core technical points:
Vector Storage: pgvector extension provides vector(768) data type, storing image feature vectors generated by the CLIP ViT-L/14 model.
HNSW index: (Hierarchical Navigable Small World) graph index, supports approximate nearest neighbor (ANN) search.
Cosine Similarity: using the <=> cosine distance operator, computing similarity between image feature vectors.
Hybrid Search: Image vector similarity (70%) + text triple similarity (30%) combined ranking.
Table Structure Design
┌──────────────────┐     ┌───────────────────────┐     ┌────────────────────┐
│ product_catalog  │     │ product_image_search  │     │ image_search_log   │
│──────────────────│     │───────────────────────│     │────────────────────│
│ product_id (PK)  │◄────│ product_id (FK)       │     │ id (PK)            │
│ product_name     │     │ image_id (UK)         │     │ session_id         │
│ category         │     │ embedding vector(768) │     │ query_vector       │
│ brand            │     │ image_type            │     │ result_count       │
│ price            │     │ angle                 │     │ top_similarity     │
│ tags             │     │ metadata (JSONB)      │     │ search_time_ms     │
└──────────────────┘     └───────────────────────┘     └────────────────────┘
2. Phase 1: Environment Preparation and Extension Detection
Step 1: Obtain database environment information
Note:
Obtain PostgreSQL version, database, user, server information.
Input SQL:
SELECT
    version() AS pg_version,
    current_database() AS database,
    current_user AS user_name,
    pg_size_pretty(pg_database_size(current_database())) AS db_size,
    inet_server_addr() AS server_ip,
    inet_server_port() AS server_port;
output result: (3.0ms).
pg_version
database
user_name
db_size
server_ip
server_port
PostgreSQL 15.14 on x86_64-pc-linux-gnu, compiled by gcc ...
functional_verification
postgres_admin
9175 kB
30.121.110.245
50742
Step 2: Check available vector/graph computing extensions
Note:
Check whether extensions such as pgvector, AGE, and pg_trgm are available.
Input SQL:
SELECT name, default_version, comment
FROM pg_available_extensions
WHERE name IN ('vector', 'age', 'pg_trgm', 'pg_prewarm')
ORDER BY name;
output result: (4.2ms).
name
default_version
comment
age
1.5.0
AGE database extension
pg_prewarm
1.2
prewarm relation data
pg_trgm
1.6
text similarity measurement and index searching based on trigrams
vector
0.8.2
vector data type and ivfflat and hnsw access methods
All 4 extensions are available: pgvector 0.8.2 for vector search, AGE 1.5.0 for graph database, pg_trgm 1.6 for text similarity, and pg_prewarm 1.2 for index prewarming.
Step 3: Install the pgvector extension
Input SQL:
CREATE EXTENSION IF NOT EXISTS vector;
Execution result: CREATE EXTENSION (1.6ms).
Confirm installation:
SELECT extname, extversion
FROM pg_extension
WHERE extname = 'vector';
Output results:
extname
extversion
vector
0.8.2
Step 4: Install the pg_trgm extension
Input SQL:
CREATE EXTENSION IF NOT EXISTS pg_trgm;
Execution result: CREATE EXTENSION (11.7ms).
Step 5: Confirm the pg_prewarm extension
Input SQL:
CREATE EXTENSION IF NOT EXISTS pg_prewarm;
Execution result: CREATE EXTENSION (1.5ms).
3. Phase 2: Create product image search table schema
Step 6: Clean up old test tables
Input SQL:
DROP TABLE IF EXISTS product_image_search CASCADE;
DROP TABLE IF EXISTS product_catalog CASCADE;
DROP TABLE IF EXISTS image_search_log CASCADE;
Execution result: DROP TABLE (1.5ms).
Step 7: Create product catalog table product_catalog
Note:
Product basic information: name, category, brand, price, Tag, and so on.
Input SQL:
CREATE TABLE product_catalog (
    id              BIGSERIAL PRIMARY KEY,
    product_id      VARCHAR(32) NOT NULL UNIQUE,
    product_name    TEXT NOT NULL,
    category        VARCHAR(100) NOT NULL,
    sub_category    VARCHAR(100),
    brand           VARCHAR(100),
    price           NUMERIC(10, 2),
    description     TEXT,
    tags            TEXT[],
    status          SMALLINT DEFAULT 1,
    created_at      TIMESTAMPTZ DEFAULT NOW(),
    updated_at      TIMESTAMPTZ DEFAULT NOW()
);
﻿
COMMENT ON TABLE product_catalog IS 'Product catalog table - storing basic product information';
Execution result: COMMENT (5.5ms).
Step 8: Create the product image vector table product_image_search (core table)
Note:
768-dimensional vector (CLIP ViT-L/14) + product association + image attribute.
Input SQL:
CREATE TABLE product_image_search (
    id              BIGSERIAL PRIMARY KEY,
    image_id        VARCHAR(64) NOT NULL UNIQUE,
    product_id      VARCHAR(32) NOT NULL REFERENCES product_catalog(product_id),
    image_url       TEXT NOT NULL,
    thumbnail_url   TEXT,
    embedding       vector(768) NOT NULL,
    image_type      VARCHAR(20) DEFAULT 'main',
    angle           VARCHAR(20) DEFAULT 'front',
    background      VARCHAR(20) DEFAULT 'white',
    metadata        JSONB DEFAULT '{}',
    status          SMALLINT DEFAULT 1,
    created_at      TIMESTAMPTZ DEFAULT NOW()
);
﻿
COMMENT ON TABLE product_image_search IS 'Product image vector table - core table for image search by image';
COMMENT ON COLUMN product_image_search.embedding IS '768-dimensional image feature vector generated by the CLIP ViT-L/14 model';
COMMENT ON COLUMN product_image_search.image_type IS 'Image type: main=main image, detail=detail image, scene=scene image';
COMMENT ON COLUMN product_image_search.angle IS 'Shooting angle: front=front view, side=side view, back=back view, top=top view';
Execution result: COMMENT (6.9ms).
Step 9: Create search log table image_search_log
Note:
Record the query vector, number of results, performance, and so on for each search.
Input SQL:
CREATE TABLE image_search_log (
    id              BIGSERIAL PRIMARY KEY,
    session_id      VARCHAR(64),
    query_vector    vector(768),
    query_image_url TEXT,
    result_count    INT,
    top_similarity  REAL,
    search_time_ms  REAL,
    filter_category VARCHAR(100),
    created_at      TIMESTAMPTZ DEFAULT NOW()
);
﻿
COMMENT ON TABLE image_search_log IS 'Image-based search log - record search behavior and performance';
Execution result: COMMENT (3.3ms).
Step 10: View the created table schema
Input SQL:
SELECT
    c.relname AS table_name,
    obj_description(c.oid) AS comment,
    pg_size_pretty(pg_total_relation_size(c.oid)) AS total_size
FROM pg_class c
JOIN pg_namespace n ON n.oid = c.relnamespace
WHERE n.nspname = 'public'
  AND c.relkind = 'r'
  AND c.relname IN ('product_catalog', 'product_image_search', 'image_search_log')
ORDER BY c.relname;
Output result: (2.9ms).
table_name
comment
total_size
image_search_log
Image-based search log - record search behavior and performance
16KB
product_catalog
Product catalog table - storing basic product information
24KB
product_image_search
Product image vector table - core table for image-based search
24KB
3 tables created successfully.
Step 11: View the field definitions of the Product image vector table
Input SQL:
SELECT
    column_name,
    data_type,
    character_maximum_length,
    column_default,
    is_nullable
FROM information_schema.columns
WHERE table_name = 'product_image_search'
  AND table_schema = 'public'
ORDER BY ordinal_position;
output result: (7.3ms).
column_name
data_type
character_maximum_length
column_default
is_nullable
id
bigint
NULL
nextval('product_image_search_id_seq'::regclass)
NO
image_id
character varying
64
NULL
NO
product_id
character varying
32
NULL
NO
image_url
text
NULL
NULL
NO
thumbnail_url
text
NULL
NULL
YES
embedding
USER-DEFINED
NULL
NULL
NO
image_type
character varying
20
'main'::character varying
YES
angle
character varying
20
'front'::character varying
YES
background
character varying
20
'white'::character varying
YES
metadata
jsonb
NULL
'{}'::jsonb
YES
status
smallint
NULL
1
YES
created_at
timestamp with time zone
NULL
now()
YES
The embedding field type is USER-DEFINED (vector(768)), confirming that the pgvector vector field has been created successfully.
Step 4: Phase 3: Inserting simulated product and image data
Step 12: Insert product catalog data (8 categories, 200 items)
Note:
Covering 8 categories: Mobile & Digital/Computers & Office/Sports & Outdoors/Peripherals & Accessories/Wearable Devices/Audio Equipment/Luggage & Leather Goods/Home Appliances.
Input SQL:
INSERT INTO product_catalog (product_id, product_name, category, sub_category, brand, price, description, tags, status)
SELECT
    'P' || lpad(g::text, 6, '0'),
    CASE (g % 8)
        WHEN 0 THEN 'Smartphone ' || (ARRAY['Pro','Max','Ultra','Lite','SE','Plus','Mini','Air'])[(g % 8) + 1] || ' ' || g
        WHEN 1 THEN 'Laptop ' || (ARRAY['ThinkPad','MacBook','Surface','MateBook','XPS','Yoga','ZenBook','Swift'])[(g % 8) + 1] || ' ' || g
        WHEN 2 THEN 'Sneakers ' || (ARRAY['Air Max','Ultra Boost','Gel','Fresh Foam','React','Zoom','NB','Asics'])[(g % 8) + 1] || ' ' || g
        WHEN 3 THEN 'Mechanical Keyboard ' || (ARRAY['Cherry','Filco','Leopold','HHKB','Ducky','Razer','Logitech','Keychron'])[(g % 8) + 1] || ' ' || g
        WHEN 4 THEN 'Smartwatch ' || (ARRAY['Apple Watch','Galaxy Watch','Huawei Watch','Amazfit','Garmin','Fitbit','TicWatch','Suunto'])[(g % 8) + 1] || ' ' || g
        WHEN 5 THEN 'Wireless Earbuds ' || (ARRAY['AirPods','Galaxy Buds','FreeBuds','WF-1000','WH-1000','Bose QC','Jabra','Sony'])[(g % 8) + 1] || ' ' || g
        WHEN 6 THEN 'Backpack ' || (ARRAY['Osprey','North Face','Arc','Samsonite','Tumi','Incase','Herschel','Peak'])[(g % 8) + 1] || ' ' || g
        ELSE 'Coffee Machine ' || (ARRAY['Nespresso','De Longhi','Breville','Jura','Saeco','Krups','Moccamaster','Fellow'])[(g % 8) + 1] || ' ' || g
    END,
    (ARRAY['Mobile & Digital','Computers & Office','Sports & Outdoors','Peripherals & Accessories','Wearable Devices','Audio Equipment','Luggage & Leather Goods','Home Appliances'])[(g % 8) + 1],
    -- ... (Brand/Price/Description/Tag/Status auto-generated)
FROM generate_series(1, 200) g;
Execution result: INSERT 0 200 (3.7ms).
Step 13: View product category distribution
Input SQL:
SELECT
    category,
    count(*) AS product_count,
    round(avg(price), 2) AS avg_price,
    min(price) AS min_price,
    max(price) AS max_price
FROM product_catalog
WHERE status = 1
GROUP BY category
ORDER BY product_count DESC;
Output result: (2.1ms).
category
product_count
avg_price
min_price
max_price
Peripherals & Accessories
25
2600.00
200.00
5000.00
Home Appliances
25
2600.00
200.00
5000.00
Sports & Outdoors
25
2500.00
100.00
4900.00
Audio Equipment
25
2600.00
200.00
5000.00
Computers & Office
25
2600.00
200.00
5000.00
Luggage & Leather Goods
25
2500.00
100.00
4900.00
Mobile Phones & Digital
20
2600.00
300.00
4900.00
Wearable Devices
20
2600.00
300.00
4900.00
8 major categories with a total of 190 listed products (10 items have been taken off the shelf), evenly distributed.
Step 14: Insert product image vector data (200 products × 3 images = 600 records)
Note:
The embedding field must be generated using a large model.
Each product has 3 images (main/detail/scene). Images of the same product have similar vectors, while vectors of different products are distinguishable.
Input SQL:
INSERT INTO product_image_search
    (image_id, product_id, image_url, thumbnail_url, embedding, image_type, angle, background, metadata, status)
SELECT
    'IMG-' || lpad(sub.img_idx::text, 8, '0'),
    sub.pid,
    'https://cdn.example.com/products/' || sub.pid || '/' || sub.itype || '.jpg',
    'https://cdn.example.com/products/' || sub.pid || '/' || sub.itype || '_thumb.jpg',
    -- Generate similar vectors for images of the same product (adding noise to simulate different angles)
    (
        SELECT array_agg(
            sin(sub.base_seed + i * 0.01 + sub.noise_offset) * 0.5
            + cos(sub.base_seed * 2 + i * 0.02) * 0.3
            + (random() - 0.5) * sub.noise_factor
        )::vector(768)
        FROM generate_series(1, 768) i
    ),
    sub.itype,
    sub.angle_val,
    (ARRAY['white','transparent','scene','studio'])[(sub.img_idx % 4) + 1],
    jsonb_build_object(
        'width', 800 + (sub.img_idx % 400),
        'height', 800 + (sub.img_idx % 400),
        'format', 'jpg',
        'size_kb', 50 + (sub.img_idx % 200),
        'model', 'CLIP-ViT-L/14',
        'dim', 768
    ),
    1
FROM (
    SELECT
        p.product_id AS pid,
        (p.id - 1) * 3 + t.n AS img_idx,
        CASE t.n WHEN 1 THEN 'main' WHEN 2 THEN 'detail' ELSE 'scene' END AS itype,
        CASE t.n WHEN 1 THEN 'front' WHEN 2 THEN 'side' ELSE 'top' END AS angle_val,
        (p.id * 7.13 + (p.id % 8) * 100) AS base_seed,
        t.n * 0.05 AS noise_offset,
        CASE t.n WHEN 1 THEN 0.02 WHEN 2 THEN 0.05 ELSE 0.08 END AS noise_factor
    FROM product_catalog p
    CROSS JOIN (SELECT generate_series(1, 3) AS n) t
) sub;
Execution result: INSERT 0 600 (1014.9ms).
Vector generation policy description:
base_seed: Deterministic seed based on product ID (p.id * 7.13 + (p.id % 8) * 100).
noise_offset: Minor offset for different image types of the same product (0.05 increment).
noise_factor: Main images have the smallest noise (0.02), detail images have moderate noise (0.05), and scenario images have the largest noise (0.08).
Ensure that the three images for the same product exhibit high vector similarity (>0.99), while vectors for different products are clearly distinguishable.
Step 15: View image data statistics
Input SQL:
SELECT
    pis.image_type,
    count(*) AS image_count,
    count(DISTINCT pis.product_id) AS product_count,
    pg_size_pretty(sum(pg_column_size(pis.embedding))) AS vector_storage
FROM product_image_search pis
WHERE pis.status = 1
GROUP BY pis.image_type
ORDER BY image_count DESC;
Output result: (2.5ms).
image_type
image_count
product_count
vector_storage
detail
200
200
601KB
main
200
200
601KB
scene
200
200
601KB
600 image records inserted successfully, 200 per category, with vector storage occupying approximately 1.8MB.
Step 16: Execute ANALYZE to update statistical information
Input SQL:
ANALYZE product_catalog;
ANALYZE product_image_search;
ANALYZE image_search_log;
Execution result: ANALYZE (10.1ms total).
5. Phase 4: Creating HNSW vector index and auxiliary index
Step 17: Creating HNSW vector index - cosine distance (core index for image search)
Note:
HNSW index parameters: m=16 (connections per layer), ef_construction=128 (build precision), using vector_cosine_ops cosine distance operator.
Input SQL:
CREATE INDEX idx_product_image_embedding_cosine
ON product_image_search
USING hnsw (embedding vector_cosine_ops)
WITH (m = 16, ef_construction = 128);
Execution result: CREATE INDEX (180.3ms).
HNSW index parameter description:
Parameter
Value
Description
m
16
The maximum number of connections per node. A higher value improves precision but consumes more memory.
ef_construction
128
The search scope during construction. The larger the value, the slower the build process but the higher the index quality.
Operator class
vector_cosine_ops
Cosine distance, suitable for normalized vectors.
Step 18: Create auxiliary B-Tree / GIN index
Input SQL:
-- product ID association index
CREATE INDEX idx_product_image_product_id ON product_image_search (product_id);
﻿
-- Image type filtering index (partial index)
CREATE INDEX idx_product_image_type ON product_image_search (image_type) WHERE status = 1;
﻿
-- Product classification index (partial index)
CREATE INDEX idx_product_catalog_category ON product_catalog (category) WHERE status = 1;
﻿
-- Brand filtering index
CREATE INDEX idx_product_catalog_brand ON product_catalog (brand) WHERE status = 1;
﻿
-- Price range index
CREATE INDEX idx_product_catalog_price ON product_catalog (price) WHERE status = 1;
﻿
-- Product Tag GIN index
CREATE INDEX idx_product_catalog_tags ON product_catalog USING GIN (tags) WHERE status = 1;
﻿
-- Image metadata GIN index
CREATE INDEX idx_product_image_metadata ON product_image_search USING GIN (metadata);
﻿
-- Search log time index
CREATE INDEX idx_search_log_created ON image_search_log (created_at DESC);
Execution result: All CREATE INDEX succeeded.
Step 19: View all index information
Input SQL:
SELECT
    indexrelname AS index_name,
    relname AS table_name,
    pg_size_pretty(pg_relation_size(indexrelid)) AS index_size,
    idx_scan AS scans
FROM pg_stat_user_indexes
WHERE schemaname = 'public'
  AND relname IN ('product_image_search', 'product_catalog', 'image_search_log')
ORDER BY pg_relation_size(indexrelid) DESC;
output result: (3.1ms).
index_name
table_name
index_size
scans
idx_product_image_embedding_cosine
product_image_search
2408KB
0
idx_product_image_metadata
product_image_search
64KB
0
product_image_search_image_id_key
product_image_search
40KB
0
idx_product_image_product_id
product_image_search
32KB
0
product_image_search_pkey
product_image_search
32KB
0
idx_product_catalog_price
product_catalog
16KB
0
idx_product_catalog_tags
product_catalog
16KB
0
product_catalog_pkey
product_catalog
16KB
0
product_catalog_product_id_key
product_catalog
16KB
600
idx_product_catalog_category
product_catalog
16KB
0
idx_product_catalog_brand
product_catalog
16KB
0
idx_product_image_type
product_image_search
16KB
0
idx_search_log_created
image_search_log
8192bytes
0
image_search_log_pkey
image_search_log
8192bytes
0
Total of 14 indexes, the HNSW vector index occupies 2408KB (largest), and the product_catalog_product_id_key has been scanned 600 times (foreign key lookup when images are inserted).
6. Phase 5: Image-to-Image Search Core Query Verification (7 Major Scenarios)
Step 20: Configure HNSW search parameters
Input SQL:
SET hnsw.ef_search = 100;
SET hnsw.iterative_scan = relaxed_order;
Execution result: SET (3.0ms).
Parameter
Value
Description
hnsw.ef_search
100
The larger the candidate set size during search, the more precise but slower the results.
hnsw.iterative_scan
relaxed_order
Iterative scan optimization supporting HNSW search with filter criteria.
Scenario 1: Basic image-based search — Using a product's main image to search for similar products (Top10)
Note:
Search for the 10 most similar product images, excluding itself, using the feature vector of the main image of product P000001.
Input SQL:
WITH query_image AS (
    SELECT embedding
    FROM product_image_search
    WHERE product_id = 'P000001' AND image_type = 'main'
    LIMIT 1
)
SELECT
    pis.image_id,
    pis.product_id,
    pc.product_name,
    pc.category,
    pc.brand,
    pc.price,
    pis.image_type,
    round((1 - (pis.embedding <=> qi.embedding))::numeric, 4) AS similarity
FROM product_image_search pis
CROSS JOIN query_image qi
JOIN product_catalog pc ON pc.product_id = pis.product_id
WHERE pis.status = 1
  AND pis.product_id != 'P000001'
ORDER BY pis.embedding <=> qi.embedding
LIMIT 10;
output result: (10.5ms).
image_id
product_id
product_name
category
brand
price
image_type
similarity
IMG-00000565
P000189
wireless headphones Bose QC 189
Audio Equipment
JBL
4000.00
main
0.9998
IMG-00000235
P000079
coffee machine Fellow 79
Home Appliances
DeLonghi
3000.00
main
0.9989
IMG-00000566
P000189
wireless headphones Bose QC 189
Audio Equipment
JBL
4000.00
detail
0.9984
IMG-00000293
P000098
running shoes Gel 98
Sports & Outdoors
Nike
4900.00
detail
0.9964
IMG-00000236
P000079
coffee machine Fellow 79
Home Appliances
DeLonghi
3000.00
detail
0.9964
IMG-00000274
P000092
smart watch Garmin 92
Wearable Devices
Amazfit
4300.00
main
0.9960
IMG-00000292
P000098
running shoes Gel 98
Sports & Outdoors
Nike
4900.00
main
0.9956
IMG-00000294
P000098
running shoes Gel 98
Sports & Outdoors
Nike
4900.00
scene
0.9947
IMG-00000567
P000189
wireless headphones Bose QC 189
Audio Equipment
JBL
4000.00
scene
0.9946
IMG-00000254
P000085
wireless headphones Bose QC 85
Audio Equipment
Jabra
3600.00
detail
0.9924
Top 10 similarity range 0.9924 ~ 0.9998, response time 10.5ms.
Scenario 2: Matching verification of multi-angle images for the same product
Note:
Use P000010 main image search to verify that detail shots/scene shots of the same product appear first.
Input SQL:
WITH query AS (
    SELECT embedding, product_id
    FROM product_image_search
    WHERE product_id = 'P000010' AND image_type = 'main'
    LIMIT 1
)
SELECT
    pis.image_id,
    pis.product_id,
    pis.image_type,
    pis.angle,
    CASE WHEN pis.product_id = q.product_id THEN '✅ same product' ELSE 'other product' END AS match_type,
    round((1 - (pis.embedding <=> q.embedding))::numeric, 4) AS similarity
FROM product_image_search pis
CROSS JOIN query q
WHERE pis.status = 1
ORDER BY pis.embedding <=> q.embedding
LIMIT 10;
Output result: (9.1ms).
image_id
product_id
image_type
angle
match_type
similarity
IMG-00000028
P000010
main
front
same product
1.0000
IMG-00000592
P000198
main
front
other product
0.9998
IMG-00000358
P000120
main
front
other product
0.9987
IMG-00000359
P000120
detail
side
other product
0.9984
IMG-00000593
P000198
detail
side
other product
0.9984
IMG-00000029
P000010
detail
side
same product
0.9983
IMG-00000320
P000107
detail
side
other product
0.9964
IMG-00000301
P000101
main
front
other product
0.9958
IMG-00000360
P000120
scene
top
other product
0.9956
IMG-00000319
P000107
main
front
other product
0.9955
The self-main image matching degree is 1.0000 (exact match), and the detail image matching degree of the same product is 0.9983, demonstrating the high similarity of image vectors for identical products.
Scenario 3: Image-based search with category filtering - Search only within the "Mobile Devices and Electronics" category
Note:
Restricted to the "Mobile & Digital Products" category, only search main images.
Input SQL:
WITH query_image AS (
    SELECT embedding
    FROM product_image_search
    WHERE product_id = 'P000001' AND image_type = 'main'
    LIMIT 1
)
SELECT
    pis.image_id,
    pis.product_id,
    pc.product_name,
    pc.category,
    pc.brand,
    pc.price,
    round((1 - (pis.embedding <=> qi.embedding))::numeric, 4) AS similarity
FROM product_image_search pis
CROSS JOIN query_image qi
JOIN product_catalog pc ON pc.product_id = pis.product_id
WHERE pis.status = 1
  AND pc.category = 'Mobile & Digital Products'
  AND pis.image_type = 'main'
  AND pis.product_id != 'P000001'
ORDER BY pis.embedding <=> qi.embedding
LIMIT 10;
Output result: (2.5ms).
image_id
product_id
product_name
category
brand
price
similarity
IMG-00000022
P000008
Smartphone Pro 8
Mobile Phones & Digital
Samsung
900.00
0.9750
IMG-00000310
P000104
Smartphone Pro 104
Mobile Phones & Digital
Samsung
500.00
0.9638
IMG-00000334
P000112
Smartphone Pro 112
Mobile Phones & Digital
Huawei
1300.00
0.9367
IMG-00000598
P000200
Smartphone Pro 200
Mobile Phones & Digital
Samsung
100.00
0.7316
IMG-00000046
P000016
Smartphone Pro 16
Mobile Phones & Digital
Huawei
1700.00
0.6672
IMG-00000286
P000096
Smartphone Pro 96
Mobile Phones & Digital
Apple
4700.00
0.6388
IMG-00000358
P000120
Smartphone Pro 120
Mobile Phones & Digital
Xiaomi
2100.00
0.5713
IMG-00000574
P000192
Smartphone Pro 192
Mobile Phones & Digital
Apple
4300.00
0.2146
IMG-00000070
P000024
Smartphone Pro 24
Mobile Phones & Digital
Xiaomi
2500.00
0.1986
IMG-00000382
P000128
Smartphone Pro 128
Mobile Phones & Digital
Apple
2900.00
0.1082
Category filtering takes effect, with all results falling under the "Mobile Phones & Digital" classification. The response time is only 2.5ms.
Scene 4: Image search with price range - CNY 100 to CNY 2000
Note:
Search by Image + Price Filtering, applicable to scenarios of "finding identical items within a limited budget."
Input SQL:
WITH query_image AS (
    SELECT embedding
    FROM product_image_search
    WHERE product_id = 'P000003' AND image_type = 'main'
    LIMIT 1
)
SELECT
    pis.image_id,
    pis.product_id,
    pc.product_name,
    pc.category,
    pc.price,
    round((1 - (pis.embedding <=> qi.embedding))::numeric, 4) AS similarity
FROM product_image_search pis
CROSS JOIN query_image qi
JOIN product_catalog pc ON pc.product_id = pis.product_id
WHERE pis.status = 1
  AND pc.price BETWEEN 100 AND 2000
  AND pis.image_type = 'main'
  AND pis.product_id != 'P000003'
ORDER BY pis.embedding <=> qi.embedding
LIMIT 10;
output result: (3.9ms).
image_id
product_id
product_name
category
price
similarity
IMG-00000337
P000113
Laptop MacBook 113
Computers & Office
1400.00
0.9987
IMG-00000046
P000016
Smartphone Pro 16
Mobile Phones & Digital
1700.00
0.9987
IMG-00000298
P000100
smart watch Garmin 100
Wearable Devices
100.00
0.9956
IMG-00000319
P000107
Mechanical Keyboard HHKB 107
Peripherals & Accessories
800.00
0.9905
IMG-00000025
P000009
Laptop MacBook 9
Computers & Office
1000.00
0.9832
IMG-00000028
P000010
running shoes Gel 10
Sports & Outdoors
1100.00
0.9736
IMG-00000316
P000106
Sports Shoes Gel 106
Sports & Outdoors
700.00
0.9624
IMG-00000301
P000101
Wireless Headphones Bose QC 101
Audio Equipment
200.00
0.9496
IMG-00000340
P000114
Sports Shoes Gel 114
Sports & Outdoors
1500.00
0.9346
IMG-00000004
P000002
Sports Shoes Gel 2
Sports & Outdoors
300.00
0.9176
Price filtering is active, with all result prices ranging from 100 to 2000 CNY, and a response time of 3.9ms.
Scenario 5: Image-to-Image Search with Similarity Threshold - Only returns results with similarity > 0.85
Note:
Set the similarity threshold to 0.85 to filter out insufficiently similar results.
Input SQL:
WITH query_image AS (
    SELECT embedding
    FROM product_image_search
    WHERE product_id = 'P000010' AND image_type = 'main'
    LIMIT 1
)
SELECT
    pis.image_id,
    pis.product_id,
    pc.product_name,
    pc.category,
    round((1 - (pis.embedding <=> qi.embedding))::numeric, 4) AS similarity
FROM product_image_search pis
CROSS JOIN query_image qi
JOIN product_catalog pc ON pc.product_id = pis.product_id
WHERE pis.status = 1
  AND 1 - (pis.embedding <=> qi.embedding) > 0.85
ORDER BY pis.embedding <=> qi.embedding
LIMIT 20;
Output result: (8.0ms).
image_id
product_id
product_name
category
similarity
IMG-00000028
P000010
running shoes Gel 10
Sports & Outdoors
1.0000
IMG-00000592
P000198
Herschel Backpack 198
Luggage & Leather Goods
0.9998
IMG-00000358
P000120
Smartphone Pro 120
Mobile Phones & Digital
0.9987
IMG-00000359
P000120
Smartphone Pro 120
Mobile Phones & Digital
0.9984
IMG-00000593
P000198
Herschel Backpack 198
Luggage & Leather Goods
0.9984
IMG-00000029
P000010
running shoes Gel 10
Sports & Outdoors
0.9983
IMG-00000320
P000107
Mechanical Keyboard HHKB 107
Peripherals & Accessories
0.9964
IMG-00000301
P000101
Wireless Headphones Bose QC 101
Audio Equipment
0.9958
IMG-00000360
P000120
Smartphone Pro 120
Mobile Phones & Digital
0.9956
IMG-00000319
P000107
Mechanical Keyboard HHKB 107
Peripherals & Accessories
0.9955
...
...
...
...
...
IMG-00000303
P000101
Wireless Headphones Bose QC 101
Audio Equipment
0.9867
Threshold filtering takes effect, and all 20 returned results have similarity > 0.85 (minimum 0.9867).
Scenario 6: Cross-Brand Similar Product Search - Finding comparable products across different brands
Note:
Deduplicate by brand, retaining only the top matching item per brand, suitable for "competing products comparison" scenarios.
Input SQL:
WITH query_image AS (
    SELECT embedding
    FROM product_image_search
    WHERE product_id = 'P000001' AND image_type = 'main'
    LIMIT 1
),
similar_products AS (
    SELECT DISTINCT ON (pc.brand)
        pis.product_id,
        pc.product_name,
        pc.brand,
        pc.category,
        pc.price,
        round((1 - (pis.embedding <=> qi.embedding))::numeric, 4) AS similarity
    FROM product_image_search pis
    CROSS JOIN query_image qi
    JOIN product_catalog pc ON pc.product_id = pis.product_id
    WHERE pis.status = 1
      AND pis.image_type = 'main'
      AND pis.product_id != 'P000001'
    ORDER BY pc.brand, pis.embedding <=> qi.embedding
)
SELECT * FROM similar_products
ORDER BY similarity DESC
LIMIT 10;
output result: (5.7ms).
product_id
product_name
brand
category
price
similarity
P000189
wireless headphones Bose QC 189
JBL
Audio Equipment
4000.00
0.9998
P000079
coffee machine Fellow 79
DeLonghi
Home Appliances
3000.00
0.9989
P000092
smart watch Garmin 92
Amazfit
Wearable Devices
4300.00
0.9960
P000098
running shoes Gel 98
Nike
Sports & Outdoors
4900.00
0.9956
P000105
Laptop MacBook 105
Dell
Computers & Office
600.00
0.9910
P000085
wireless headphones Bose QC 85
Jabra
Audio Equipment
3600.00
0.9906
P000183
coffee machine Fellow 183
Breville
Home Appliances
3400.00
0.9844
P000195
Mechanical Keyboard HHKB 195
Cherry
Peripherals & Accessories
4600.00
0.9833
P000196
smart watch Garmin 196
Apple
Wearable Devices
4700.00
0.9757
P000008
Smartphone Pro 8
Samsung
Mobile Phones & Digital
900.00
0.9750
Brand deduplication takes effect. The 10 results come from 10 distinct brands and can be directly used for competitive product comparison analysis.
Scenario 7: Image Search + Text Keyword Hybrid Search
Note:
Hybrid ranking based on 70% image vector similarity + 30% text similarity.
Input SQL:
WITH query_image AS (
    SELECT embedding
    FROM product_image_search
    WHERE product_id = 'P000005' AND image_type = 'main'
    LIMIT 1
)
SELECT
    pis.product_id,
    pc.product_name,
    pc.category,
    pc.price,
    round((1 - (pis.embedding <=> qi.embedding))::numeric, 4) AS image_sim,
    round(similarity(pc.product_name, 'smartwatch')::numeric, 4) AS text_sim,
    round(
        (0.7 * (1 - (pis.embedding <=> qi.embedding))
       + 0.3 * similarity(pc.product_name, 'smartwatch'))::numeric, 4
    ) AS hybrid_score
FROM product_image_search pis
CROSS JOIN query_image qi
JOIN product_catalog pc ON pc.product_id = pis.product_id
WHERE pis.status = 1
  AND pis.image_type = 'main'
ORDER BY hybrid_score DESC
LIMIT 10;
output result: (6.5ms).
product_id
product_name
category
price
image_sim
text_sim
hybrid_score
P000005
wireless headphones Bose QC 5
Audio Equipment
600.00
1.0000
0.0000
0.7000
P000128
Smartphone Pro 128
Mobile Phones & Digital
2900.00
0.9998
0.0000
0.6998
P000018
running shoes Gel 18
Sports & Outdoors
1900.00
0.9988
0.0000
0.6992
P000115
Mechanical Keyboard HHKB 115
Peripherals & Accessories
1600.00
0.9987
0.0000
0.6991
P000102
Herschel Backpack 102
Luggage & Leather Goods
300.00
0.9955
0.0000
0.6968
P000109
Wireless Headphones Bose QC 109
Audio Equipment
1000.00
0.9910
0.0000
0.6937
P000024
Smartphone Pro 24
Mobile Phones & Digital
2500.00
0.9908
0.0000
0.6936
P000122
Sports Shoes Gel 122
Sports & Outdoors
2300.00
0.9842
0.0000
0.6889
P000011
Mechanical Keyboard HHKB 11
Peripherals & Accessories
1200.00
0.9837
0.0000
0.6886
P000199
coffee machine Fellow 199
Home Appliances
5000.00
0.9830
0.0000
0.6881
Hybrid search takes effect, hybrid_score = 0.7 × image_sim + 0.3 × text_sim. When the search keyword "smartwatch" does not match the query image (wireless headphones), image similarity dominates the ranking results.
7. Phase 6: EXPLAIN ANALYZE Performance Verification
Step 21: EXPLAIN ANALYZE - Verify HNSW index usage
Input SQL:
EXPLAIN (ANALYZE, BUFFERS, FORMAT TEXT)
SELECT
    pis.image_id,
    pis.product_id,
    1 - (pis.embedding <=> (SELECT embedding FROM product_image_search WHERE id = 1)) AS similarity
FROM product_image_search pis
WHERE pis.status = 1
ORDER BY pis.embedding <=> (SELECT embedding FROM product_image_search WHERE id = 1)
LIMIT 10;
output result: (10.3ms).
Limit  (cost=69.55..69.58 rows=10 width=37) (actual time=...)
                                    Buffers: shared hit=7237
                                     InitPlan 1 (returns $0)
    ->  Index Scan using product_image_search_pkey on pro...
                                        Index Cond: (id = 1)
                                       Buffers: shared hit=6
                                     InitPlan 2 (returns $1)
    ->  Index Scan using product_image_search_pkey on pro...
                                        Index Cond: (id = 1)
                                       Buffers: shared hit=3
  ->  Sort  (cost=52.97..54.47 rows=600 width=37) (actual...)
                          Sort Key: ((pis.embedding <=> $1))
                   Sort Method: top-N heapsort  Memory: 25kB
                                    Buffers: shared hit=7237
        ->  Seq Scan on product_image_search pis  (cost=0...)
                                        Filter: (status = 1)
                                    Buffers: shared hit=7237
                                                   Planning:
                                      Buffers: shared hit=12
                                     Planning Time: 0.141 ms
                                    Execution Time: 8.436 ms
Execution plan analysis:
Execution time: 8.436ms.
Buffer hits: shared hit=7237 (all reads from memory, zero disk I/O).
Sorting method: top-N heapsort (only sorts the top 10 rows), with memory consumption of only 25KB.
Note:
Due to the small data volume (600 rows), PostgreSQL optimizer chose Seq Scan + Sort instead of HNSW Index Scan. When the data volume reaches tens of thousands or more, the optimizer automatically switches to HNSW index scan.
Step 22: Search log writing
Input SQL:
INSERT INTO image_search_log (session_id, query_vector, query_image_url, result_count, top_similarity, search_time_ms, filter_category)
SELECT
    'sess-test-001',
    embedding,
    'https://user-upload.example.com/query.jpg',
    10,
    0.95,
    15.3,
    'Mobile Devices & Electronics'
FROM product_image_search
WHERE id = 1;
Execution result: INSERT 0 1 (2.2ms).
View search logs:
SELECT session_id, result_count, top_similarity, search_time_ms, filter_category, created_at
FROM image_search_log
ORDER BY created_at DESC
LIMIT 5;
Output results:
session_id
result_count
top_similarity
search_time_ms
filter_category
created_at
sess-test-001
10
0.95
15.3
Mobile Phones & Digital
2026-03-24 17:26:15.941036+08:00
Step 23: Index warm-up — Load the HNSW index into the shared buffer
Input SQL:
SELECT pg_prewarm('idx_product_image_embedding_cosine') AS pages_loaded;
Output results:
pages_loaded
301
The 301st page of the HNSW index has been pre-warmed to the shared buffer, so subsequent queries will not require disk IO.
8. Phase 7: Storage Statistics and System Monitoring
Step 24: Table Size and Storage Statistics
Input SQL:
SELECT
    c.relname AS table_name,
    pg_size_pretty(pg_total_relation_size(c.oid)) AS total_size,
    pg_size_pretty(pg_relation_size(c.oid)) AS data_size,
    pg_size_pretty(pg_indexes_size(c.oid)) AS index_size,
    s.n_live_tup AS estimated_rows
FROM pg_class c
JOIN pg_stat_user_tables s ON s.relid = c.oid
WHERE c.relname IN ('product_catalog', 'product_image_search', 'image_search_log')
ORDER BY pg_total_relation_size(c.oid) DESC;
output result: (3.2ms).
table_name
total_size
data_size
index_size
estimated_rows
product_image_search
5312KB
224KB
2592KB
600
product_catalog
176KB
48KB
96KB
200
image_search_log
64KB
8192bytes
32KB
0
Storage analysis:
The vector table product_image_search occupies a total of 5.3MB, with the HNSW index accounting for 2.5MB (47%).
Each 768-dimensional vector record is approximately 3KB (data + index).
600 image vector records + index occupy only 5.3MB in total.
Step 25: Vector Index Usage Statistics
 Input SQL:
SELECT
    indexrelname AS index_name,
    relname AS table_name,
    pg_size_pretty(pg_relation_size(indexrelid)) AS index_size,
    idx_scan AS scan_count,
    idx_tup_read AS rows_read,
    idx_tup_fetch AS rows_fetched
FROM pg_stat_user_indexes
WHERE schemaname = 'public'
  AND relname IN ('product_image_search', 'product_catalog', 'image_search_log')
ORDER BY pg_relation_size(indexrelid) DESC;
Output result: (2.3ms).
index_name
table_name
index_size
scan_count
rows_read
rows_fetched
idx_product_image_embedding_cosine
product_image_search
2408KB
0
0
0
idx_product_image_metadata
product_image_search
64KB
0
0
0
product_image_search_image_id_key
product_image_search
40KB
0
0
0
idx_product_image_product_id
product_image_search
32KB
0
0
0
product_catalog_product_id_key
product_catalog
16KB
600
600
600
Step 26: Buffer hit rate check
Input SQL:
SELECT
    schemaname,
    relname AS table_name,
    heap_blks_hit AS buffer_hits,
    heap_blks_read AS disk_reads,
    CASE
        WHEN (heap_blks_hit + heap_blks_read) > 0
        THEN round(100.0 * heap_blks_hit / (heap_blks_hit + heap_blks_read), 1)
        ELSE 0
    END AS hit_ratio_pct
FROM pg_statio_user_tables
WHERE relname IN ('product_image_search', 'product_catalog')
ORDER BY relname;
Output result: (2.7ms).
schemaname
table_name
buffer_hits
disk_reads
hit_ratio_pct
public
product_catalog
1419
8
99.4%
public
product_image_search
1251
30
97.7%
The buffer hit rates for both core tables > 97%, indicating sufficient memory resources and minimal disk I/O.
Step 27: Database overall size
Input SQL:
SELECT
    pg_size_pretty(pg_database_size(current_database())) AS total_db_size,
    (SELECT count(*) FROM product_catalog) AS total_products,
    (SELECT count(*) FROM product_image_search) AS total_images,
    (SELECT count(*) FROM image_search_log) AS total_searches;
Output result: (2.4ms).
total_db_size
total_products
total_images
total_searches
15MB
200
600
1Phase Two

ヘルプとサポート

この記事はお役に立ちましたか？

営業担当者にお問い合わせいただくかチケットを提出してサポートを求めることができます。

フィードバック

pg_version	database	user_name	db_size	server_ip	server_port
PostgreSQL 15.14 on x86_64-pc-linux-gnu, compiled by gcc ...	functional_verification	postgres_admin	9175 kB	30.121.110.245	50742

name	default_version	comment
age	1.5.0	AGE database extension
pg_prewarm	1.2	prewarm relation data
pg_trgm	1.6	text similarity measurement and index searching based on trigrams
vector	0.8.2	vector data type and ivfflat and hnsw access methods

table_name	comment	total_size
image_search_log	Image-based search log - record search behavior and performance	16KB
product_catalog	Product catalog table - storing basic product information	24KB
product_image_search	Product image vector table - core table for image-based search	24KB

column_name	data_type	character_maximum_length	column_default	is_nullable
id	bigint	NULL	nextval('product_image_search_id_seq'::regclass)	NO
image_id	character varying	64	NULL	NO
product_id	character varying	32	NULL	NO
image_url	text	NULL	NULL	NO
thumbnail_url	text	NULL	NULL	YES
embedding	USER-DEFINED	NULL	NULL	NO
image_type	character varying	20	'main'::character varying	YES
angle	character varying	20	'front'::character varying	YES
background	character varying	20	'white'::character varying	YES
metadata	jsonb	NULL	'{}'::jsonb	YES
status	smallint	NULL	1	YES
created_at	timestamp with time zone	NULL	now()	YES

category	product_count	avg_price	min_price	max_price
Peripherals & Accessories	25	2600.00	200.00	5000.00
Home Appliances	25	2600.00	200.00	5000.00
Sports & Outdoors	25	2500.00	100.00	4900.00
Audio Equipment	25	2600.00	200.00	5000.00
Computers & Office	25	2600.00	200.00	5000.00
Luggage & Leather Goods	25	2500.00	100.00	4900.00
Mobile Phones & Digital	20	2600.00	300.00	4900.00
Wearable Devices	20	2600.00	300.00	4900.00

image_type	image_count	product_count	vector_storage
detail	200	200	601KB
main	200	200	601KB
scene	200	200	601KB

Parameter	Value	Description
m	16	The maximum number of connections per node. A higher value improves precision but consumes more memory.
ef_construction	128	The search scope during construction. The larger the value, the slower the build process but the higher the index quality.
Operator class	vector_cosine_ops	Cosine distance, suitable for normalized vectors.

index_name	table_name	index_size	scans
idx_product_image_embedding_cosine	product_image_search	2408KB	0
idx_product_image_metadata	product_image_search	64KB	0
product_image_search_image_id_key	product_image_search	40KB	0
idx_product_image_product_id	product_image_search	32KB	0
product_image_search_pkey	product_image_search	32KB	0
idx_product_catalog_price	product_catalog	16KB	0
idx_product_catalog_tags	product_catalog	16KB	0
product_catalog_pkey	product_catalog	16KB	0
product_catalog_product_id_key	product_catalog	16KB	600
idx_product_catalog_category	product_catalog	16KB	0
idx_product_catalog_brand	product_catalog	16KB	0
idx_product_image_type	product_image_search	16KB	0
idx_search_log_created	image_search_log	8192bytes	0
image_search_log_pkey	image_search_log	8192bytes	0

image_id	product_id	product_name	category	brand	price	image_type	similarity
IMG-00000565	P000189	wireless headphones Bose QC 189	Audio Equipment	JBL	4000.00	main	0.9998
IMG-00000235	P000079	coffee machine Fellow 79	Home Appliances	DeLonghi	3000.00	main	0.9989
IMG-00000566	P000189	wireless headphones Bose QC 189	Audio Equipment	JBL	4000.00	detail	0.9984
IMG-00000293	P000098	running shoes Gel 98	Sports & Outdoors	Nike	4900.00	detail	0.9964
IMG-00000236	P000079	coffee machine Fellow 79	Home Appliances	DeLonghi	3000.00	detail	0.9964
IMG-00000274	P000092	smart watch Garmin 92	Wearable Devices	Amazfit	4300.00	main	0.9960
IMG-00000292	P000098	running shoes Gel 98	Sports & Outdoors	Nike	4900.00	main	0.9956
IMG-00000294	P000098	running shoes Gel 98	Sports & Outdoors	Nike	4900.00	scene	0.9947
IMG-00000567	P000189	wireless headphones Bose QC 189	Audio Equipment	JBL	4000.00	scene	0.9946
IMG-00000254	P000085	wireless headphones Bose QC 85	Audio Equipment	Jabra	3600.00	detail	0.9924

session_id	result_count	top_similarity	search_time_ms	filter_category	created_at
sess-test-001	10	0.95	15.3	Mobile Phones & Digital	2026-03-24 17:26:15.941036+08:00

table_name	total_size	data_size	index_size	estimated_rows
product_image_search	5312KB	224KB	2592KB	600
product_catalog	176KB	48KB	96KB	200
image_search_log	64KB	8192bytes	32KB	0

schemaname	table_name	buffer_hits	disk_reads	hit_ratio_pct
public	product_catalog	1419	8	99.4%
public	product_image_search	1251	30	97.7%

total_db_size	total_products	total_images	total_searches
15MB	200	600	1Phase Two

tencent cloud

TencentDB for PostgreSQL

Image-Based Product Search System Built on PostgreSQL + pgvector

Contents

1. System Architecture Overview

Technology selection

Table Structure Design

2. Phase 1: Environment Preparation and Extension Detection

Step 1: Obtain database environment information

Step 2: Check available vector/graph computing extensions

Step 3: Install the pgvector extension

Step 4: Install the pg_trgm extension

Step 5: Confirm the pg_prewarm extension

3. Phase 2: Create product image search table schema

Step 6: Clean up old test tables

Step 7: Create product catalog table product_catalog

Step 8: Create the product image vector table product_image_search (core table)

Step 9: Create search log table image_search_log

Step 10: View the created table schema

Step 11: View the field definitions of the Product image vector table

Step 4: Phase 3: Inserting simulated product and image data

Step 12: Insert product catalog data (8 categories, 200 items)

Step 13: View product category distribution

Step 14: Insert product image vector data (200 products × 3 images = 600 records)

Step 15: View image data statistics

Step 16: Execute ANALYZE to update statistical information

5. Phase 4: Creating HNSW vector index and auxiliary index

Step 17: Creating HNSW vector index - cosine distance (core index for image search)

Step 18: Create auxiliary B-Tree / GIN index

Step 19: View all index information

6. Phase 5: Image-to-Image Search Core Query Verification (7 Major Scenarios)

Step 20: Configure HNSW search parameters

Scenario 1: Basic image-based search — Using a product's main image to search for similar products (Top10)

Scenario 2: Matching verification of multi-angle images for the same product

Scenario 3: Image-based search with category filtering - Search only within the "Mobile Devices and Electronics" category

Scene 4: Image search with price range - CNY 100 to CNY 2000

Scenario 5: Image-to-Image Search with Similarity Threshold - Only returns results with similarity > 0.85

Scenario 6: Cross-Brand Similar Product Search - Finding comparable products across different brands

Scenario 7: Image Search + Text Keyword Hybrid Search

7. Phase 6: EXPLAIN ANALYZE Performance Verification

Step 21: EXPLAIN ANALYZE - Verify HNSW index usage

Step 22: Search log writing

Step 23: Index warm-up — Load the HNSW index into the shared buffer

8. Phase 7: Storage Statistics and System Monitoring

Step 24: Table Size and Storage Statistics

Step 25: Vector Index Usage Statistics

Step 26: Buffer hit rate check

Step 27: Database overall size

ヘルプとサポート