当前位置：文档之家› PARSING ANS TAGGING OF BINLINDUAL DICTIONARY

PARSING ANS TAGGING OF BINLINDUAL DICTIONARY

September10,2003 LAMP-TR-106

CAR-TR-991

CS-TR-4529

UMIACS-TR-2003-97

PARSING AND TAGGING OF BINLINGUAL

DICTIONARY

HUANFENG MA

BURCU KARAGOL-AYAN

DAVID DOERMANN

DOUG OARD

JIANQIANG WANG

LAMP-TR-106

UMIACS-TR-2003-97

CAR-TR-991

CS-TR-4529

PARSING ANS TAGGING OF BINLINDUAL

DICTIONARY

HUANFENG MA1,2

BURCU KARAGOL-AYAN1,2

DAVID DOERMANN1,2

DOUG OARD2,3

JIANQIANG WANG2,3

1Language and Media Processing Laboratory

2Institute for Advanced Computer Studies

3College of Information Studies

University of Maryland,College Park,MD20742-3275

{hfma,burcu,doermann}@https://www.doczj.com/doc/1b2154864.html,

{oard,wangjq}@https://www.doczj.com/doc/1b2154864.html,

Abstract

Bilingual dictionaries hold great potential as a source of lexical resources for training and testing automated systems for optical character recognition,machine translation,and cross-language information retrieval.In this paper,we describe a system for extracting term lexicons from printed bilingual dictionaries.Our work was divided into three phases -dictionary segmentation,entry tagging,and generation.In segmentation,pages are divided into logical entries based on structural features learned from selected examples. The extracted entries are associated with functional labels and passed to a tagging module which associates linguistic labels with each word or phrase in the entry.The output of the system is a structure that represents the entries from the dictionary.We have used this approach to parse a variety of dictionaries with both Latin and non-Latin alphabets, and demonstrate the results of term lexicon generation for retrieval from a collection of French news stories using English queries.

Keywords:Cross-Language IR,OCR,Logical Analysis,Page Segmentation, Bilingual Dictionaries

The support of this research under DARPA cooperative agreement N660010028910and DOD con-tract MDA90402C0406is gratefully acknowledged.

1Introduction

1.1The Problem

In recent years,the demand for tools capable searching written and spoken sources of multilingual information has increased tremendously.The involvement of multinational groups in activities all over the world,the proliferation of information on the Inter-net,and the general explosion of information available in multiple languages have made cross-language communication essential to many organizations.Typically,access to large collections of information in another language requires either the assistance of a speaker of that language to formulate queries and translate the retrieved documents,or an auto-mated system for Cross-Language Information Retrieval(CLIR)and Machine Translation (MT).The former is clearly not practical as a general approach for very large collections, but automated systems for CLIR and MT are rapidly evolving.CLIR systems can,how-ever,often produce acceptable results by performing term weight translation between the query and the document languages,thus allowing the use of well developed term-based search techniques to identify relevant documents.The key enabler for this strategy is to enable translation at the term level by building lexical resources of term-term translation pairs.

We can acquire the needed lexical resources in three ways.Many languages now have a substantial presence on the World Wide Web,and Resnik has shown that substantial amounts of translation-equivalent documents can be found for many languages and that a translation lexicon can be constructed from such a collection using automatic techniques for term-level alignment[33].As the Web grows,this technique could be extended to an increasingly large set of languages.Similar corpus-based techniques can also be used with documents that are available in printed form[13].Corpus-based approaches are particularly useful because the learned term-term mappings have associated translation probabilities,but infrequent terms(which are highly valued by retrieval systems because of their selectivity)are rarely observed in the training data and thus rarely included in the learned translation pairs.Hand-built translation lexicons have complementary strengths and weaknesses;they usually lack translation probabilities,but they typically have far

better coverage of rare terms that searchers actually use when posing queries.

Sometimes,these translation lexicons are available directly in electronic form.For example,the translation lexicon in an MT system could be used directly for this pur-pose.Available MT systems cover fewer than?fty of the world’s thousands of languages, however.Online dictionaries are another possible source for this information,but again the number of languages served by such resources is limited.When digital resources are unavailable,printed dictionaries o?er the third source of term-term translations.More-over,bilingual dictionaries also often contain a wealth of supplemental information about morphology,part of speech,and examples of usage that can also be useful in CLIR and MT applications.

The focus of the work reported in this article is on rapid and reliable acquisition of translation lexicons from printed bilingual dictionaries,with an ultimate goal of support-ing the development of CLIR systems for languages in which other translation resources are lacking.Given a bilingual dictionary,with one of the two languages English,a scan-ner,an optical character recognition(OCR)system that is capable of processing the character set,and an operator familiar with the language in question,we can train a retrieval system for the new language in as little as48hours.To do this,we have devel-oped an automated,but user guided approach,to parameterize and learn the physical structure of the document page and the semantics of the dictionary entries.

Figure1:Examples of bilingual dictionaries.

1.2Background and Approach

Dictionaries are members of a class of documents that are designed for easy search[8]. Their structure is typically regular and repeating,and”keys”are distinguished as access points for each entry.Figure1shows some common formats for bilingual dictionaries. The format varies from simple word-to-phrase translation pairs through full descriptions that contain parts of speech,related forms,and examples of usage.Our goal is to capture the salient structure of these entries and label each element appropriately.Because of the regular structure,we are typically able to provide a relatively small number of training samples for each dictionary,and then have the system learn the features necessary for correct segmentation and labeling.

Figure2:System architecture.

A prototypical architecture for a system is shown in Figure2.The system is broken up into three main components:Dictionary Parsing,Element Tagging,and Lexicon Generation.The general philosophy we take with all of the components is to provide a trainable system that can learn from a limited number of examples.An operator will identify and tag several dictionary entries,and it is the system’s responsibility to then learn the parameters necessary to extract and tag the remainder of the dictionary. Because of the repetitive nature of these documents,a bootstrapping approach where the system gets some feedback from an operator is appropriate.It may ultimately be possible to design general document image analysis systems that are capable of learning

this analysis with unsupervised(or very minimally supervised)techniques,but for now it remains essential to have an operator who can at least understand both languages in the bilingual dictionary.We provide an intuitive interface to assist in processing the dictionary and compilation of results.The use of the resulting translation pairs by a CLIR system can be completely automated,so the overall system development e?ort is very highly leveraged,requiring only a small amount of language-speci?c human e?ort.

1.3Related Work

Work relevant to entry segmentation is typically found in the logical layout analysis literature([17]gives an overview of traditional methods),and is often divided into two tasks:physical segmentation into zones and logical labeling.The structural complexity of di?erent documents makes it di?cult to design a generic document analysis tool that can be applied to all documents.

In our system,it is essential that we are to learn a dictionary’s structure,since that structure is typically consistent throughout a given dictionary but varies widely between dictionaries.Liang et al.[19]presented a probability-based text-line identi?cation and segmentation approach.Their approach consists of two phases:o?ine statistical training and online text-line segmentation.Kopec et al.[15]applied a stochastic approach to build Markov source models for text-line segmentation under the assumption that a symbol template is given and the zone(or text columns)had been previously extracted. Under the assumption that the physical layout structures of document images can be modeled by a stochastic regular grammar,Mao et al.[24]built a generative stochastic document model to model a Chinese-English dictionary page.They use a weighted?nite state automaton to model the projection pro?le at each level of the document’s physical layout tree,and to segment the dictionary page on all levels.Lee et al.[18]proposed a parameter-free method to segment document images with various font sizes,text-line spacing and document layout structures.There are also some rule-based segmentation methods which perform the segmentation based on rules that are either manually set up by the user[17]or learned automatically by training[22,15].

Although numerous techniques have been proposed for both script identi?cation[10,

35,34,39]and font recognition[23,14,36,44,43],none focused on either word level classi?cation or on optimizing performance for a given document.For entry-level tagging, we attach semantic labels to words or phrases.Palowitch and Stewart[29]constructed a system for automatically identifying structural features in text captured from printed documents via OCR and applying Standard Generalized Markup Language(SGML) tagging,based on the Text Encoding Initiative(TEI)Document Type De?nition.They implemented the structural and presentation markup phases as an iterative process.

For tagging,Mao and Kanungo[24]used stochastic language models to automatically extract bilingual dictionary entries by manually creating a context-free grammar(CFG) and stochastic production rules for each dictionary.They demonstrated their algorithm on a Chinese-English dictionary which contained four categories.In the general case, however,manual grammar-based approaches cannot adequately accommodate the un-certainty introduced by OCR and by errors in the document analysis.Wilms[41]reports on an attempt to impose structure on a particular dictionary using a hand-built parser, adopting a mixed approach based on pattern matching and transition networks.After extensive hand-tuning,the resulting translation lexicon was95%accurate,but correcting the approaches which require that degree of hand tuning would not be suitable for the rapid application development process that we have in mind.

The remainder of this paper is organized as follows.In Sections2and3,we de-scribe the dictionary parsing and tagging processes in detail,along with examples and experimental results.We outline the implementation for the system in Section5,and we explain how the resulting translation pairs are used to perform cross-language retrieval (with further experiment results)in Section6.We conclude the paper with a discussion of what we have learned and a few words about our future plans.

2Dictionary Parsing

The?rst step in our system involves physically acquiring the page images,segmenting the images into individual entries,extracting functional characteristics of various elements of the entry,and labeling the types of extracted entries.

2.1Dictionary Acquisition

In our environment,we assume that we have obtained a copy of a printed bilingual dictionary that can be scanned,either page by page or automatically(after busting the binding).Clearly,the quality of the resulting image can signi?cantly e?ect our ability to analyze the dictionary,so care must be taken to ensure quality.Images are stored uncompressed at300dpi,and adjustments are made to accommodate paper thickness and print intensity in order to optimize contrast,minimize bleed through,and preserve subtle di?erences in font properties.

2.2Entry Segmentation

Entry segmentation addresses the general problem of identifying repeating structures by learning the physical and semantic features that characterize them.Unlike traditional page segmentation problems where zones are characterized by spatial proximity,we of-ten?nd that publishers of documents with multiple entries(dictionaries,phone books, and other lists)use di?erent font properties(bold,italics,size etc)and layout features (indentation,bullets,etc.)to indicate a new entry.Although such characteristics vary for di?erent documents,they are often consistent within a single document,and hence the task suggests learning techniques.

2.2.1Overview

The goal of entry segmentation is to segment each page into multiple(sometimes partial) entries or alternatively to organize multiple lines of text as a single entry.Since the extraction of text lines in dictionaries is relatively straightforward,the problem of entry segmentation can be posed as the problem of?nding the?rst(or last)line of each entry.

A signi?cant contribution to the e?ectiveness of entry segmentation results from appli-cation of a bootstrap technique for the generation of new training samples.Bootstrapping helps to make the segmentation adaptive,progressively improving segmentation accuracy. Figure3illustrates the approach,with each iteration consisting of:

?Feature Extraction and Analysis:extraction of physical characteristics which

indicate an entry boundary.

?Training and Segmentation:learning the parameters of an entry segmentation model.

?Correction and Bootstrapping:feedback from an operator,who makes cor-rections to a small subset of the results that contain https://www.doczj.com/doc/1b2154864.html,ing the corrected segmentation results,bootstrapping samples are generated and used to retrain the system.

Figure3:Entry segmentation system architecture.

2.2.2Feature Extraction and Analysis

Based on a study of di?erent types of structured documents,the following features have been found to be useful for both segmentation and,ultimately,for tagging.Examples are shown in Figure4.

Special symbols:Punctuation,numbers,and other nonalphabetic symbols are often used to start a new entry,to end an entry,or to mark the continuation of a text line.

Word font,face and size:Word font,face and size(especially the features of the ?rst word in each entry)are often important entry features.On a dictionary page,

for example,the?rst word of each entry(typically the headword)can be bold,all upper case,a di?erent font,or larger than the rest of the entry.

Word patterns:Words often form distinguishable patterns which are used consis-tently to convey the structure of an entry.

Symbol patterns:Sequences of symbols sometimes form consistent patterns that represent the beginning or ending of an entry.

Line structures:The indentation,spacing,length,or height of text lines in an entry are represented within our system as“line structures”that may have di?erent characteristics at the beginning or end of an entry.

Other features:Additional features such as regularities in the spacing between adjacent entries,text position on the page,script type,word spacing,and character case may help with interpretation of the principal features described above.

Figure4:Examples of features.

Features Weight

First Word Symbol16.04

Word Style Pattern10.3

Ending Symbol15.46

Symbol Pattern 5.77

Word Symbol Pattern 5.77

Line Structure16.67

Table1:Features&weights of dictionary in Figure4.

During the training phase,a Bayesian framework is used to assign and update the prob-abilities of extracted features.Based on estimated probabilities,each extracted feature

will be assigned a weight that can be used to compute the entry score from all extracted features.The detailed procedure is as follows:

(1)Construct a histogram of feature occurrences in the training samples;

(2)Compute the feature occurrence rate as the empirical feature probability.Suppose

there are N training entries,and there are K extracted features.Then for feature i(1≤i≤K),the probability can be computed as:p i=K i/N,where K i is the number of occurrence of feature i;

(3)Assign feature weights based on this empirical probability as follows:w i=p i

×100,

A where A= K i=1p i and1≤i≤K.Consider the extracted features as a formed feature space,each entry is projected to this space and a“voting score”is computed as:F V= K i=1w i S i,where S i=1if feature i appears,otherwise S i=0;

(4)Obtain the minimum,maximum,and average voting scores of entries;these values

will be used as thresholds in the segmentation stage.

Table1shows an example of extracted features and assigned weights,where line structure(with heaviest weight)is the most important feature.

2.2.3Training and Segmentation

Because of the complexity of many structured documents,it would be di?cult to man-ually determine the optimal value for some parameters.So,given a small training set, we attempt to learn appropriate parameters.One way to do that is to extract all pos-sible features from the training set,segment some pages based on the learned features, and generate a new training set from the original set plus selected initial segmentation results.This technique used to generate new”bootstrap”samples was?rst proposed by Efron[9].

Segmentation is an iterative procedure that maximizes the feature voting score of an entry(computed using the formula described previously)in the feature space.Based on the features extracted in the feature extraction phase,a document can,in principle,be segmented into entries by searching for the beginning and ending text-lines of an entry.

This search operation is a threshold-based iterative and relaxation-like procedure,and the threshold is estimated from the training set.The segmentation procedure can be performed as follows:

(1)Search candidate entries for the?rst text line in one zone by feature matching.

This operation is equivalent to determining whether the?rst line in one zone is the beginning of a new entry or a continuation of an entry from the previous zone or previous page.

(2)Search for the end of an entry.This operation is actually accomplished by searching

for the beginning of the next entry,since the beginning of a new entry ends the previous entry.

(3)Remove the extracted entries,and iterate until all new entries are identi?ed.

2.2.4Correction and Bootstrapping

In order to correct a system-hypothesized segmentation,the operator can performing one or more operations on any segment:split one segment into two or more individual segments,merge two or more adjacent segments into a single segment,resize a segment to include or exclude text-lines or columns,move or change the position of a segment (equivalent to resizing both the beginning and end of a segment,retraining the same number of text-lines),remove a segment or relabel a segment.

We applied the procedure described in[20]to generate bootstrap samples in such a way that no entry is selected more than once.Generated bootstrap samples are the linear combination of the training samples in source,based on random weights.The result is a set of dictionary entries that represent a physical partition of the page.A similar training and labeling process is applied to each entry on the page to label,for example,entries which are a continuation from a previous page or column.Optical Character Recognition is also applied.

Figure 5:Entry segmentation results,English-French dictionary.Document

Page No Total Entries Correct Entries Incorrect Entries False Alarm Mislabeled Entries Missed Overlapped Merged Split EnglishFrench

6352017496.11%0.005% 1.00% 2.62%0.26%0.21%0.80%FrenchEnglish

75242397.90%0.04% 1.61%0.25%0.21%0.25%0.49%EnglishTurkish

96351799.26%0.00%0.17%0.11%0.45%0.23%0.31%TurkishEnglish

70265498.98%0.04%0.26%0.00%0.72%0.08%0.38%CebuanoEnglish 50215299.21%0.00%0.14%0.56%0.00%0.00% 4.46%

Table 2:Segmentation results.

2.2.5Experimental Results of Segmentation

Our system have been developed and implemented in C++with a Java-based interface.In this section,we highlight entry segmentation results.The segmentation approach was applied to ?ve dictionaries with di?erent structural characteristics:French-English (613pages),English-French (657pages),Turkish-English (909pages),English-Turkish (1152pages),and Cebuano-English (303pages).Figure 5shows the results for the English-French dictionary.The performance is shown in Table 2,which was based on the available ground truth for these dictionaries.Figure 6shows the performance improvement that results from bootstrapping.The evaluation is performed on 50pages from each dictionary.The initial segmentation was based on only four entry samples.Iterations following the initial segmentation added di?erent numbers of training entries to generate bootstrap samples.The chart in Figure 6shows that the segmentation can be re?ned step by step by applying the bootstrap technique.

Figure6:Progressive performance improvement from bootstrapping.

2.3Functional Labeling

2.3.1Trainable Classi?er Design

Functional properties are properties of words or groups of words that are often used by publishers to implicitly label semantic roles.For example,changes in font-face or font-style are often used to indicate the intended interpretation of speci?c strings such as pronunciation information,part of speech,or an example of usage.Similarly,the script itself can be considered to be a functional property.Working with documents containing both Latin and non-Latin text requires us to be able to identify scripts before applying an appropriate OCR algorithm.These types of features prove to be extremely useful for both entry recognition and for tagging the constituents of an entry.

Unfortunately,most OCR systems are designed to detect standard variation in these properties.Recognition accuracy depends heavily on the quality of images.Even within the same dictionary,the bold face text may vary signi?cantly among di?erent images,so we use a trainable approach to recognize the font face.Because of the limited number of functional properties,it is very easy for us to provide samples to design an optimal classi?er.It should be noted that all the script identi?cation approaches mentioned in the related work are at the block or page level,and the font recognition method in[43] attempts to identify the”primary”font on the page.Obviously,this does not model

the case of bilingual dictionaries well,since words in di?erent languages are typically

interspersed;script identi?cation must therefore be done at the word level.In our work, we perform that classi?cation based on a Gabor?lter analysis of texture.

The use of Gabor?lters to extracting texture features from images is motivated by: (1)the Gabor representation has been shown to be optimal in the sense of minimizing the joint two-dimensional uncertainty in space and frequency[5];and(2)Gabor?lters can be considered as orientation and scale tunable edge and line detectors,and the statistics of these micro-features in a given region are often useful for characterizing the underlying texture information.The system to classify scripts,font-styles and font-faces is shown in Figure7.

Figure7:Flowchart of classi?cation system.

Figure8:Examples of image replication:(a,c)before;(b,d)after.

We created a set of training samples for each class by randomly picking entries from a large set of words belonging to that class.Words in di?erent scripts,di?erent font-styles, and di?erent font-faces in the same script have di?erent dimensions(width&height), so we use word image replication to generate a new image with prede?ned size(64x64 pixels in our case),all features are extracted at that image size.For computation,the

Scripts NNM SVM

R avg R min R max R avg R min R max Arabic/Roman73.27%67.42%82.91%90.43%85.14%94.17% Chinese/Roman84.08%73.36%89.55%90.10%82.68%94.6% Korean/Roman83.49%78.98%87.03%90.91%86.88%93.57% Hindi/Roman92.08%87.53%100%93.28%87.95%96.73% Table3:Script identi?cation performance comparison.

Normal Bold Italic Total No2504044

Identi?ed1843539

Accuracy73.6%87.5%88.6% Table4:Font-face identi?cation.

Total No Times Arial Detection Times39397100% Arial3973282.1% Table5:Font-style identi?cation.

replication is to a power of two such that a Fast Fourier Transform can be applied to compute the Gabor?lter features.Figure8shows word replication examples of two di?erent scripts(Arabic and English).

After generating a word images,multi-channel Gabor?lter technique was applied to extract the texture features of each class.We used the same pairs of isotropic Gabor ?lters as in[43]to extract16channels of texture information.We implemented two classi?ers,one using Nearest Neighbor Matching(NNM)[21]and one using a support vector machine(SVM)[3].

2.3.2Results of Functional Labeling

We applied the script identi?cation approach to Arabic-English,Chinese-English,Hindi-English,and Korean-English bilingual dictionaries.Features for each script class were extracted from samples on a single page,and the two classi?ers were trained using the same samples.Table3shows the performance comparison,where the results were based on20(32for Chinese)pages of identi?cation results.The comparison shows the SVM-based script identi?cation works much better than the NNM-based technique,with the SVM achieving an average accuracy above90%.The font-face classi?cation approach was applied to the identi?ed English(Roman)document images,and one page of the identi?cation results for the Korean-English dictionary is shown in Table4.In order to test the e?ectiveness of this classi?er on font-styles,we applied it to an arti?cial test document image.The classi?er was trained by words taken from a di?erent document

Headword(Lexicon)Translation

Pronunciation Tense

Part of speech(POS)Gender

Plural Form Number

Domain Context

Cross reference Language

Antonym Derived word

Synonym Derived word translation

In?ected form Example

Irregular form Example translation

Alternative spelling Idiom

Explanation Idiom translation

Table6:Some categories typically found in bilingual dictionaries.

(each with10training words).Table5gives the classi?cation result.Trainable techniques are also applied to recognize special symbols and to correct symbols the OCR fails on.

3Tagging

The functional segmentation module provides the entry,entry type,list of tokens1in the entry,properties(font,font face,script,etc.)of each token,and OCR results.Once we have segmented the dictionary into individual entries,we need to tag each token (or group of tokens)with the role of element in the entry,which we refer to as the “linguistic category.”As with entry segmentation,tagging relies on the existence of a repeated semantic structure.Unfortunately,however,the semantic structure of the entries typically varies a great deal more than the physical structure of the entries.

Our approach to tagging is to use(1)the functional properties of tokens,such as font, font face,font style and their relationships to each other,(2)keywords(common terms or symbols,often de?ned in the preface to the dictionary),and(3)separators(typically punctuation)to tag the entries in a systematic way.The key is to be able to discover the regularities in the occurrences of these clues and make use of them to assign each word to a linguistic category.A list of categories that are commonly found in bilingual dictionaries is shown in Figure6.Typically,only a subset of those categories is used in a given dictionary,but our system is extensible at runtime to add new categories for additional constructs.

1We de?ne a token as a set of glyphs in the OCR output that is separated by white space.We refer to a group of one or more tokens as an element.

Figure9:French-English dictionary.

Publishers typically use a combination of methods to indicate the linguistic category. As illustrated in Figure9,functional properties(changes in font,font style,font-size, etc.)can be used to implicitly indicate the role of an element,keywords can be used to make that role explicit,and various separators can be used to organize them.For instance,headwords may be in bold,examples of usage in italics,the keywords adj,n or v may be used to explicitly represent the part-of-speech,pronunciation may be o?set with brackets,commas may be used to separate di?erent examples of usage,and a numbering system may be used to enumerate variant forms.

Since there can be errors in both functional labeling and OCR,we need methods that are robust to local variations.We have therefore developed two complementary approaches.The?rst approach is rule-based,which is instrumental in understanding the structure of the dictionary(Section3.1).The second is a Hidden Markov Model (HMM)-based approach that aids in overcoming errors introduced during segmentation (Section3.2).The overall tagging architecture is shown in Figure10.

3.1Rule-Based Approach

For the rule-based approach,an operator uses explicit knowledge of the structure of the dictionary entries to select an inventory of tag types and to indicate how they typically

appear in the scanned dictionary.The operator can select from the pre-de?ned list of

Figure10:Architecture for entry tagging.

possible categories(linguistic meanings),and if a category is found in the dictionary but not on that list,the new category can be added.Similarly,physical attributes are selected from a list(normal,bold,italics,etc).Keywords are identi?ed in a table provided by the operator,typically based on the legend at the beginning of the dictionary.For each keyword,we associate a single linguistic meaning.Separators(or symbols)and their functions are de?ned using the operators shown in(Table7).Typically InPlaceOf and Contains operands are su?cient,but in some dictionaries,spatial proximity and relative position are important,so additional operators are provided for such cases.

The tagging proceeds as https://www.doczj.com/doc/1b2154864.html,ing the font style and separator as dividers,the tokens in an entry are grouped into elements that will be tagged as a unit.The resulting elements are spans that are(1)rendered in a consistent font style,(2)identi?ed as one of the given keywords,and/or(3)delimited by one of the separators from the given set.In Figure9,for instance,the tokens to,be,recommended are grouped together as a single element since there is no separator between them,the token before them ends with the symbol’]’,they end with a semi-colon,and they have the same font.Each group of tokens will be assigned a single linguistic category.Because uncertainty in the document image analysis process leads to errors in the segmentation,several rules can be created for each category.For instance,there are some cases where the separators are recognized incorrectly,so we might choose to indicate the pronunciation begins with either”(”or

”[”.

After the tokens in an entry are grouped,the tagging process attempts to associate a single tag with each group.In this process,we again make use of font styles,keywords, and separators.We?rst check to see whether the group we are considering is in the list of keywords and if it is,we tag it accordingly.Otherwise,we use the separators and the font information.Here,we arrange the categories in a precedence order,?rst checking for translated tokens in the primary language(a derived form of the headword, an example,or an idiom),then the translations,and?nally the supplemental categories, such as pronunciation,part-of-speech or gender.This order is tuned to our follow-on task, extracting term-term translation pairs,but it could be adapted for other applications.

Operand De?nition Example

?cat?InPlaceOf?sym?Used as a shortcut for a category InPlaceOf headword～

?cat?StartsWith?sym?Category begins with this separator pronunciation StartsWith[

?cat?EndsWith?sym?Category ends with this separator translation EndsWith;

?cat?PreviousEndsWith?sym?Previous category ends

with this separator translation PreviousEndsWith?comma?

?cat?Contains?sym?Category contains this separator derived Contains?dot?

Table7:Operators used to model separators(?cat?=category,?sym?=symbol).

When applying operators,all must be valid for the category to be assigned to the element.This strict approach makes it practical to rapidly write robust rules without tuning their?ring order(e.g.,when font is the only feature that distinguishes between one pair of categories).If no category satis?es all of the constraints,operators are tried individually until a match is found.If we still cannot decide the category based solely on the observed features and the available rules we assign a”miscellaneous”tag to the element(this tag can also be assigned by a rule if desired).As the last step of the rule-based tagging process,we apply a cleanup process in which some punctuation and other nonalphabetic characters are removed by default.The operator can also specify additional characters to be cleaned if necessary.

3.2Statistical Learning Approach

For dictionaries with highly regular structures and fairly accurate OCR,the rule based system e?ectively leverages human abilities to generalize from a limited number of exam-

计算机基本技能测试培训方案范文

基本技能培训方案一、培训目的做好教育部11月份对我院学生基培养学生的计算机基本操作能力和实际使用能力，本技能测试工作。二、培训对象 07级非计算机专业全体学生（预科班和自考班除外），共33个教学班三、培训工作小组长：李伟舵组副组长：彭自然魏振西员：刘文涛陈丽佳吴文胜陈宇朱红陈兰成向征齐锋华龚芝四、总体方案 1、培训师资：由计算机系教师精英组成。 2、考核办法：教务处会同计算机系对任课教师考评，根据到课率和考试成绩评定 A、B、C三等，学生考试通过率未达到90%的教师只能评为C等。A等课时费上浮 50%，B等获标准课时费，C等课时费下浮50%。 3、测试（即培训中期和培训末期两次测试）安排：第一次测试安排在9月25日晚上进行。由培训教师从每个班挑选5名成绩优秀的学生参加，分成三个机房进行测试。第二次测试安排在10月18日、19日和20日晚上进行，全体学生均参加。两次考试成绩都将记入教师考核成绩，考试用的试题由学生从试题库中随机抽取，并采取在线考试方式。 4、培训方式：培训以系为单位，采取以集中培训为主，分散培训为辅的方式进行。集中培训时间为每周一、二、三的晚上，每次以11个班为单位，周四、五、日晚对上一次的课进行复习。两次集中培训均安排在科技楼5、6楼机房进行。分散培训要求各系辅导员组织学生在集中培训以外的时间进行。此外，教师充分利用网上答疑、电话答疑、课堂答疑和办公室答疑等多种方式，解答学生的疑难。 5、软件清单：Windows2000Professional或者Windows XP Professional、Office2000（完全安装）、Office2000XP、Visual Basic6.0等，且能够上网。（学院机房需提前安装好培训所需软件）6、试题来源：为确保基本技能测试顺利通过，由计算机系老师根据评估要求且结合学生实际情况，精心编制5套难易适中的培训试题，构成计算机基本技能试题库。 7、培训时间：2008年9月8日—10月20日。五、培训大纲计算机应用基本操作是作为被评估学校的基本技能的必测内容。对高职学生而言，考．核其计算机应用的基本能力，应重在操作和应用。即操作考核为主，理论考核为辅。高职学生应掌握如下的计算机基础知识和基本应用能力。 1、掌握计算机应用的基础知识这些基础知识包括：计算机概述、微型计算机的硬件和软件组成、数据在计算机中表

计算机基础练习与答案

《计算机基础》练习题第1～3章一．选择题 ⑴计算机系统由( B )两部分组成。 A、主机系统和显示器系统 B、硬件系统和软件系统 C、主机和打印机系统 D、主机系统和视窗95系统 ⑵操作系统是( B )。 A、应用软件 B、系统软件 C、计算软件 D、字表处理软件 ⑶微型计算机的更新与发展，主要基于（ B ）的变革。 A、软件 B、微处理器 C、存储器 D、磁盘的容量 ⑷下面（ B ）组设备包括输入设备、输出设备和存储设备。 A、CRT、CPU、ROM B、鼠标器、绘图仪、光盘 C、磁盘、鼠标器、键盘 D、磁带、打印机、激光打印机 ⑸内存空间地址段为2001H～7000H，则其存储空间为（ C ）KB。 A、20480 B、20.48 C、20 D、5 ⑹电子计算机主要是以( B )划分发展阶段的。 A、集成电路 B、电子元件 C、电子管 D、晶体管 ⑺最早的计算机的用途是用于( A )。 A、科学计算 B、自动控制 C、系统仿真 D、辅助设计 ⑻计算机唯一能直接识别、执行的计算机语言是（ A ）。 A、机器语言 B、汇编语言 C、BASIC语言 D、Ｃ语言 ⑼在GB2312-80汉字系统中，计算机把一个汉字表示为（ D ）。 A、汉语拼音字母的ASCII代码 B、十进制数的二进制编码 C、按字形笔划设计的二进制码 D、两个字节的二进制编码 ⑽调制解调器的作用是在计算机网络中对输入、输出信号作（ D ）。 A、压缩和解压缩 B、加密与解密 C、加速与减速 D、数模转换 ⑾第（ C ）代计算机开始使用操作系统。 A、一 B、二 C、三 D、四 ⑿下列不同进制数从小到大排列顺序正确的是（ C ）。 A、1011001<75<0X2A<037 B、75<1011001<037<0X2A C、037<0X2A<75<1011001 D、0X2A<037<75<1011001 ⒀在Word中，要扩大某段（多行）文字间的间距，可以先选择该段文字，再利用［格式］菜单的（ D ）命令进行设置。 A、[分散字符] B、［文字方向］ C、［分栏］ D、［字体］ ⒁在Word中，段落“缩进”后打印出来的文本，其文本相对于打印纸界的距离为（ D ）。

《计算机基础》练习及答案

《计算机基础》练习及答案一、单选题 1.能够准确反映现代计算机的主要功能是 B 。 A.计算机可以实现高速运算 B.计算机是一种信息处理机 C.计算机可以存储大量信息 D.计算机可以代替人的脑力劳动 2.按照计算机的用途分类，可将计算机分为 D 。 A.通用计算机、个人计算机 B.数字计算机、模拟计算机 C.巨型计算机、微型计算机 D.通用计算机、专用计算机 3.工业上的自动机床属于 C 。 A.科学计算方面的计算机应用 B.数据处理方面的计算机应用 C.过程控制方面的计算机应用 D.人工智能方面的计算机应用 4.如果(52)(2A)16，则x为 B 。

A.2 B.8 C.10 D.16 5.下列数中最大的数是 D 。 A.(1000101)2 B.(107)8 C.(73)10 D.(4B)16 6.已知：3×4＝14，则4×5＝ A 。 A.24 B.26 C.30 D.36 7.字节是数据处理的基本单位，1＝ C 。 A.1 B.4 C.8 D.16 8.存储器容量大小是以字节数来度量，1＝ B 。 A.1000B B.1024B C.1024× 1024B D.1048576B 9.假设某计算机的字长为8位,则十进制数(-66)10的反码表示为 C 。 A.01000010 B.11000010

C.10111101 D.10111110 10.假设某计算机的字长为8位,则十进制数(+75)10的补码表示为 A 。 A.01001011 B.11001011 C.10110100 D.10110101 11.假设某计算机的字长为8位,则十进制数(-75)10的补码表示为 D 。 A.01001011 B.11001011 C.10110100 D.10110101 12.我国信息交换用汉字编码字符集-基本集是 C 。 5 2312 13.在下面关于字符之间大小关系的说法中，正确的是 B 。 A.6>b>B B.6>B>b >B>6 >b>6 14.已知：“B”的码值是66，则码值为1000100的字符为 B 。

计算机操作员培训大纲

计算机操作员培训大纲一、培训目标 1、初步具有信息的获取、传输、处理、发布等应用能力，能运用常用的信息处理软件解决学习、工作、生活中的问题；具有信息社会的安全意识与自律能力。具有良好的信息素养，为终身学习和持续发展打下一定的基础。 2、掌握常用信息技术工具的操作方法；具有信息收集、判断、筛选、整理、处理、传输、表达的能力。体验应用信息技术解决问题的过程与方法。能根据需要自主选择相关的信息技术工具，更有效、更有针对性地使用相关的信息技术工具；迅速有效地获取信息、甄别信息、归纳信息；有主见地正确评价利用信息技术工具完成的作品等。培养学习和使用信息技术正确的情感、态度和价值观。具有较强的信息社会的安全与道德意识，在信息技术应用过程中能按照社会公认的道德规范和法律准则进行个人自律，自觉抵制不良信息；明确在信息社会中个人对社会所应承担的义务和责任，具备在信息社会中学习、工作、生活的习惯和规范，具有为社会服务并贡献社会的强烈意识。 3、通过对计算机操作员的培训，使计算机操作员全面掌握本等级的技术理论知识和操作技能，掌握计算机操作员岗位的职业要求，为参加本岗位的鉴定做好准备。其培训的具体要求是掌握微机操作的基本技能、Windows操作系统的基本技能、掌握Word、Excel 、PowerPoint软件、因特网及其他相关操作。

二、培训要求 1、本课程以“任务驱动”，突破系统学习应用软件的理论和操作的框架，应用多项信息技术解决实际问题，培养学员综合应用信息技术解决问题的能力，构建以提出“任务”、分析“任务”、完成“任务”为主线的能力培养的教学与考核体系。 2、由于本课程的教学目的不仅是让学员掌握信息技术的基础知识和基本技能，更重要的是提升学员的信息素养，培养应用信息技术解决实际问题的综合能力，因此，在教学过程中，要注重改变教学方法，充分发挥教师的主导作用和学员的主体作用。教师必须重视现代教学理论的学习，不断地更新观念，加强信息技术与各课程的整合的研究，充分应用信息技术，运用项目教学法，探索在数字化学习环境下的新型的教学模式，为学员提供自主发展的时间和空间，努力培养学员的创新精神和实践能力，自觉地成为学员学习的引导者和促进者。学员必须重视提升自己的信息素养，培养自己应用信息技术解决问题的综合能力，张扬自己的个性特长，在学习过程中学会与人合作，自觉地成为问题的发现者和解决者。 3、课程设置与课时分配表

计算机基础知识练习答案版

计算机基础知识练习（B）1.计算机发展阶段的划分是以（）作为标志的。 A)存储器 B)逻辑元件 C)程序设计语言 D)运算速度（A）2.世界上第一台电子计算机所采用的电子元件是（）。 A）电子管 B）集成电路 C）晶体管 D）大规模及超大规模集成电路（B）3.第二代计算机使用的电子元件是（）。 A）电子管 B）晶体管 C）中小规模集成电路 D）大规模集成电路（B）4.世界上第一台电子计算机的名字叫（）。 A）EDVAC B）ENIAC C）EDSAC D）MARK-II （A）5.第一台电子计算机ENIAC诞生于（）年。 A）1946 B）1958 C）1964 D）1978 （C）6.以下不．属于计算机的特点的是（）。 A）记忆能力强 B）计算速度快 C）能完成任何工作 D）具有一定的逻辑判断能力（A）7.计算机辅助设计的英文缩写是（）。 A）CAD B）CAT C）CAI D）CAM （C）8.计算机辅助教学的英文缩写是（）。 A）CAD B）CAT C）CAI D）CAM （C）9.利用计算机进行数据编辑加工、制表、查询、统计等工作，属于计算机的（）应用领域。 A）实时控制 B）科学计算 C）信息处理 D）人工智能（D）10.我们通常使用的笔记本电脑属于（）。 A）巨型机 B）小型计算机 C）工作站 D）个人计算机（D）11.个人计算机属于（）。 A）小型计算机 B）大型计算机 C）中型计算机 D）微型计算机（C）12.一个完整的计算机系统是由（）组成。 A）主机和外部设备具 B）系统软件和应用软件 C）硬件系统和软件系统 D）主机和应用程序（A）13.微型计算机的硬件系统包括（）。 A）控制器，运算器，存储器和输入输出设备 B）控制器，主机，键盘和显示器 C）主机，电源，CPU和输入输出设备 D）CPU，键盘，显示器和打印机（D）14.在微型计算机中，访问速度最快的存储介质是（）。 A）优盘 B）硬盘 C）光盘 D）内存（D）15.在微型计算机中访问下面几个部件时,速度最快的是哪个部件（）。 A）硬盘 B）软盘 C）显示器 D）主存储器（C）16.存储器可分为（）两类。 A）RAM和ROM B）硬盘和软盘 C）内存储器和外存储器 D）ROM和Cache （D）17.在计算机中，CPU对内存储器只读不写的部件是（）。 A）RAM B）Cache C）磁盘 D）ROM （D）18.在构成计算机的基本部件中,（）与运算器通常合称为中央处理器（CPU）。 A）主存储器 B）辅存储器 C）电源 D）控制器（B）19.计算机的指挥中心是（）。 A）运算器 B）控制器 C）存储器 D）I/O设备（C）20.在微型计算机中,运算器的主要功能是进行（）。 A）算术运算 B）逻辑运算 C）算术逻辑运算 D）初等函数运算

计算机操作技能训练报告格式

计算机操作技能训练报告格式淮阴工学院《计算机操作技能训练》报告系（院）：计算机与软件工程学院_______________ 专业:物联网专业班级:物联网1161 学生姓名： _____________ 学号：______________________

任课教师: _________ 王红华肖绍章______________________ 学年学期：2016 ?2017 学年第1 学期 2015 年_11_月31 日

目录 1远程登录 (1) 2 制作FTP (10) 3信息检索 (14) 4word 应用 ........................................... .错误!未定义书签。 5EXCEL应用........................................... 错误!未定义书签。 .错误!未定义书签 7邮件合并应用 ....................................... 错误!未定义书签总结............................................... 错误!未定义书签

此为正文第i 章i 级标题，黑体，小三号，加粗, 倍^ 行距，段前、段后距均为 0.5行。下同。 1 XXXXXXX XX （正文段落首行空两个汉字，宋体，小四号, 1.5倍行距，段前、段后距均为 0行）XXXXXXXXXX XXXXXXXXXX XXXXXXXXXXXXXXXXXXXX XXXXXXX 文XXX 黑X XXX 粗X X X 亍XXX X XX 0 X ......... 1.1.1 XXXX XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX XXXXXXXXXXXXXXXXXXXXXXXXXXXXX ........................................... 2 XXXXXXX xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx XXXXXXXXXXXXXXXXXXXXXX ................................... 注： 1.1 1.5倍行距，段前、段后距均为留一个空格，下XXXXXX 正文2级标题，黑体，四号，加粗,

计算机基础知识练习题

计算机基础知识——练习题单选题： 1、通常人们所说的一个完整的计算机系统应包括_____。 A）运算器、存储器和控制器B）计算机和它的外围设备 C）系统软件和应用软件D）计算机的硬件系统和软件系统 2、构成计算机电子的、机械的物理实体称为_____。 A）计算机系统B）硬件计算机系统 C）主机 D）外设 3、按冯.诺依曼的观点，计算机由五大部件组成，它们是_____。 A）CPU、控制器、存储器、输入/输出设备 B）控制器、运算器、存储器、输入/输出设备 C）CPU、运算器、主存储器、输入/输出设备 D）CPU、控制器、运算器、主存储器、输入/输出设备 4、冯.诺依曼为现代计算机的结构奠定了基础，他的主要设计思想是_____。 A）程序存储 B）数据存储 C）虚拟存储 D）采用电子元件 5、第4代电子计算机使用的逻辑器件是_____。 A）晶体管B）电子管 C）中、小规模集成电路D）大规模和超大规模集成电路6、微型机中的CPU是_____。 A）分析、控制并执行指令的部件 B）寄存器 C）分析、控制并执行指令的部件和存储器 D）分析、控制指令的部件和存储器和驱动器 7、计算机软件一般包括系统软件和_____。

A）源程序B）应用软件 C）管理软件D）科学计算 8、计算机能直接执行的程序是_____。 A）源程序B）机器语言程序 C）BASIC语言程序D）汇编语言程序 9、_____是控制和管理计算机硬件和软件资源、合理地组织计算机工作流程、方便用户使用的程序集合。 A）监控程序B）操作系统 C）编译系统 D）应用系统 10、操作系统是为了提高计算机的工作效率和方便用户使用计算机而配备的一种_____。 A）系统软件 B）应用系统C）软件包D）通用软件 11、语言编译程序若按软件分类应属于_____。 A）系统软件B）应用软件 C）操作系统D）数据库处理系统 12、操作系统是对计算机的系统资源进行控制与管理的软件。这里系统资源指的是_____。 A）软件、数据、硬件、存储器 B）CPU、存储器、输入设备、输出设备、信息 C）程序、数据、输出设备、中央处理机 D）主机、输入、输出设备、文件、外存储器 13、操作系统是一种系统软件，它是_____的接口。 A）软件和硬件B）计算机和外设 C）用户和计算机D）高级语言和机器语言 14、下列4种软件中，属于应用软件的是_____。

计算机操作技能训练手册

楚雄师范学院计算机操作技能训练手册 2009年10月21日

绪言编写本手册是为了进行测试和训练学生计算机操作技能提供一个水平标准，其目的是提高学生计算机操作技能水平，使他们在本科教学水平评估过程，顺利通过计算机操作技能测试。技能测试应对策略 1．测试中遇到不熟悉的计算机方面的术语怎么处理？答案：由于技能测试都是在网络环境下进行的，所以如果一旦遇到不熟悉的计算机术语，你完全可以使用百度（https://www.doczj.com/doc/1b2154864.html,）来进行搜索。例如，你知道哪些即时通信工具，列举出来。如果你不知道“即时通信”这个术语，那么你就可以在百度的搜索栏里输入“即时通信”进行搜索。那么你就可以知道有MSN、QQ和UC等。 2．整个文件夹如何通过邮件传送？答案：首先要明白一点，文件可以以邮件附件的形式发送，但是文件夹是不能发送的，那么我们就只有把文件夹压缩成文件，这样就可以解决传送整个文件夹的问题了。 3．如何压缩文件和文件夹？答案：Windows下压缩文件和文件夹一般使用Winrar软件，如果你使用的机器上已经安装了Winrar，那么右键选择要压缩的文件（文件夹），如下图所示：

在弹出的菜单中选择“添加到压缩文件”，会出现“压缩文件名和参数设置”对话框。如下图所示：在“压缩文件名”中输入你喜欢的名字，点击“确定”按钮。就开始文件压缩，如下图所示：

压缩完成后，就会在当前文件夹中出现一个压缩文件（*.rar）.

试卷一时间：110分钟一、Windows操作题（10分）在桌面上建立考生文件夹（名称为：学号姓名专业），在该文件夹下创建3个同级的子文件夹，分别命名为wd、ecl和ppt。评分标准： 1、每个文件夹2.5分，共10分。二、Word操作题（30分）新建一个Word文档，使用学院主页的“学院概况”中，将“学院简介”一文中的“楚雄师范学院是2001年5月…楚雄彝族自治州首府楚雄市”、“学院坚持教学与科研并举…促进和服务作用日益彰显。”、“建院八年来…影响力不断攀升。”三部分内容插入到该文档中，按要求完成下列操作，并以文档名“楚雄师范学院简介”保存至文件夹wd中。 1、在该文档之上插入标题“楚雄师范学院简介”，标题文字格式设置为：黑体，小一号、加粗，居中。 2、将文档中各段的文字格式设置为：仿宋－GB2312、小四号、字体颜色为黑色； 3、正文第一段设置为首字下沉2行；其余各段段落设置为首行缩进2字符。 4、文档的纸型设置为A4；将文档的页面边距设置左、右边距3厘米。三、Excel操作（25分） 1、新建Excel文件，以“09年招生计划统计表”为文件名保存至ecl文件 2、使用学院校园网中的“招生就业”网“招生信息”中查找“楚雄师范学院2009年分省分专业招生计划”，并将相关数据填入工作表中； 3、为工作表添加标题：“09年招生计划统计表”，设置为：楷体_GB2312、18磅、红色，合并居中； 4、用公式或函数计算外省所占比例（保留整数部分）、合计；

大学计算机基础练习题(含答案)

判断题 1.第一代计算机的主要特征是采用晶体管作为计算机的逻辑元件。（×） 2.第二代计算机的主要特征是采用电子管作为计算机的逻辑元件。（×） 3.美国Intel公司推出的第一个微处理器芯片是Intel 8086。（×） 4.以Intel 4004为核心的电子计算机就是微型计算机，简称为微机。（√） 5.对量子计算机的研究，主要目的是解决经典计算机中的存储容量问题。（×） 6.计算机的处理能力主要由两个方面来决定：一是计算机部件的运算速度，二是部件排列的紧密程度。（√） 7.?诺依曼计算机的基本工作过程是在控制器的控制下，计算机自动地从存中取指令、分析指令再执行该指令，接着取下一条指令，周而复始地工作。（√）8.第一台具有“存储程序”思想的计算机是1946年诞生的，其名称为ENIAC。（×） 9.未来计算机可能朝着量子计算机、光子计算机和生物计算机等方向发展。（√） 10.一个完整的计算机系统由硬件系统和软件系统两部分组成。（√） 11.软件逐步硬件化是计算机的发展趋势。（√） 12.当代计算机基本属于?诺依曼体系结构。（√） 13.第三代计算机的主要特征是采用集成电路作为计算机的逻辑元件。（√） 14.第四代计算机的主要特征是采用大规模集成电路作为计算机的逻辑元件。（√） 15.总线是连接计算机外部设备的一组私有的信息通路。（×） 16.第一台PC机是由IBM公司推出的。（√） 17.按照目前计算机市场的分布情况来分，计算机可以分为大型计算机、微型计算机、嵌入式系统等。（√） 18.一体微机计算机属于嵌入式系统的畴。（×） 19.生物计算机具有体积小、功效高、能自我修复、能耗低、没有信号干扰的特点。（√） 20.光子计算机具有无需导线，一小部分能量就能驱动、信息储存量大的特点。（√） 21.自动柜员机属于微型计算机的一种。（×） 22.个人计算机（PC机）属于微型计算机。（√） 23.R进位计数制共R个基本数元。（√） 24.八进制的基本数元是从1到8。（×） 25.(100)10和(64)16相等。（√）

计算机技能高考上机练习题

湖北省普通高等学校招收中职毕业生技能高考模拟试题（一）计算机类二、操作题 1.中英文字录入（30分） 2.Windows 7的基本操作（60分）第1题（12.0分） -------------------------------------------------------------------- 请在打开的窗口中，进行下列操作，完成所有操作后，请关闭窗口。 --------------------------------------------------------------------- 1、在考生文件夹下新建名为files，pictures，test三个文件夹。 2、在files文件夹下新建名为jngk_1.bmp、mydoc$2.dll、weme!3.exe三个文件；并将mydoc$2.dll设置为隐藏文件。 3、在test文件夹下新建名为document的文件夹，将weme！3.exe移动到该文件夹下，并设置以windows XP（Sp3）兼容模式运行，以管理员身份运行。第2题（12.0分） --------------------------------------------------------------------- 请在打开的窗口中，进行下列操作，完成所有操作后，请关闭窗口。 --------------------------------------------------------------------- 1、在考生文件夹下为C:\Windows\System32\whoami.exe程序新建名为“我是谁”的快捷方式。 2、在c:\windows下搜索文件VB5DEF.HLP，将找到的该文件复制到题1的test文件夹下。第3题（12.0分） --------------------------------------------------------------------- 请在打开的窗口中，进行下列操作，完成所有操作后，请关闭窗口。 --------------------------------------------------------------------- 1、任务栏外观设置为“使用小图标” 2、自动隐藏任务栏 3、在[开始]菜单中，设置“存储并显示最近在[开始]菜单中打开的程序”

计算机基础知识练习试题及答案

计算机基础知识练习试题及答案下面是小编收集整理的计算机基础知识练习试题，希望对您有所帮助!如果你觉得不错的话,欢迎分享! 计算机基础知识试题： 1、世界上首先实现存储程序的电子数字计算机是_A___。 A、ENIAC B、UNIVAC C、EDVAC D、EDSAC 2、计算机科学的奠基人是__B__。 A、查尔斯.巴贝奇 B、图灵 C、阿塔诺索夫 D、冯.诺依曼 2、世界上首次提出存储程序计算机体系结构的是_B___。 A、艾仑图灵 B、冯诺依曼 C、莫奇莱 D、比尔盖茨 3、计算机所具有的存储程序和程序原理是_C___提出的。 A、图灵 B、布尔 C、冯诺依曼 D、爱因斯坦 4、电子计算机技术在半个世纪中虽有很大进步，但至今其运行仍遵循着一位科学家提出的基本原理。他就是__D__。 A、牛顿 B、爱因斯坦 C、爱迪生 D、冯诺依曼 5、 1946 年世界上有了第一台电子数字计算机，奠定

了至今仍然在使用的计算机__D__。 A、外型结构 B、总线结构 C、存取结构 D、体系结构 6、在计算机应用领域里，___C_是其最广泛的应用方面。 A、过程控制 B、科学计算 C、数据处理 D、计算机辅助系统 7、 1946 年第一台计算机问世以来，计算机的发展经历了 4 个时代，它们是__D__。 A、低档计算机、中档计算机、高档计算机、手提计算机 B、微型计算机、小型计算机、中型计算机、大型计算机 C、组装机、兼容机、品牌机、原装机 D、电子管计算机、晶体管计算机、小规模集成电路计算机、大规模及超大规模集成电路计算机 8、以下属于第四代微处理器的是__D__。 A、Intel8008 B、Intel8085 C、Intel8086 D、Intel80386/486/586 9、 Pentium IV 处理器属于__C__处理器。 A、第一代 B、第三代 C、第四代 D、第五代 10、计算机能够自动、准确、快速地按照人们的意图

计算机基本技能训练实习报告

《计算机基本技能训练》实习报告学年：实习名称：姓名：班级：学号：电话号码：指导老师：

目录 1、概述----------------------------------------------------------------------------------3 1.1实习目的-----------------------------------------------------------------------3 1.2实习手段-----------------------------------------------------------------------3 1.3实习进程安排-----------------------------------------------------------------3 2、实习过程及实习内容-------------------------------------------------------------4 2.1实习主要阶段性工作安排--------------------------------------------------4 2.2实习收获、感想、认识、评价等-----------------------------------------4 3、实习总结----------------------------------------------------------------------------5

1、概述 1.1实习目的在这个高度信息化的社会，计算机的应用日益普及，伴随着计算机的普及应用，办公自动化技术发生了革命性的演变，彻底改变了我们的工作方式和生产方式。办公自动化技术提高了信息资源的应用水平和共享程度，提高了人们的工作效率和决策水平，他是信息现代化的重要内容，作为当代大学生，我们有必要熟练掌握办公自动化的操纵技能，为今后的工作打下扎实的基础。通过实训教学，我们熟练掌握操作系统的使用及Office 组件的使用，能熟练的完成文档编辑方面的工作。强化各种自动化办公操作技能，进一步培养了熟练处理Word 文档的综合应用、Excel 高级数据管理、Powerpoint 演示文稿高级制作技巧及Internet 网络综合应用的能力。 1.2实习手段本次实习，以学生自主参阅学习、实践为主，辅导老师在旁答疑辅导为辅，并通过课后完成作业以加强对所学知识的巩固和深化。 1.3实习进程安 1月7日实习动员大会 1月8日 Word 技能教学及实践 1月9日—1月10日完成作业活动时间

计算机基础练习题及答案2016

填空题 1.在计算机内部一般设有数据总线、（）总线和控制总线。 2.一条指令包括操作数和（）两部分。 3.（）频率是影响CPU性能的重要因素之一。 4.分立离散的信号称为（）信号。 5.面向对象的程序设计语言，属于（）语言。 6.用户为解决自己的问题而编写的程序，称为（）。 7.一种计算机能执行的全部指令，就是这种计算机的指令（）。 8.操作数表示参加操作的数本身或操作数所在的（）。 9.指令是对机器进行程序控制的（）单位。 10.信息存储的最小单位是（）。 11.（）是在计算机各个部件之间传送信息的一组公共线路。 12.一个二进制位bit只有两个可能的值是（）。 13.时钟频率即主频是指CPU每秒钟能执行的条数，单位为（）。 14.用高级语言编写的程序称为（）。 15.计算机直接执行的（）指令由若干位二进制代码组成。 16.计算机指令的基本工作过程包括若干个步骤（如分析指令、执行指令等），而第一步是（）。 17.程序计数器中存放的是（）。 18.最初级且依赖于硬件的计算机语言是（）语言。 19.在微机中，信息的基本存储单位是字节，每个字节内含（）个二进制位。 20.RAM是（）存储器的英文缩写。 21.用高级语言编写的源程序，经过解释或编译或编译+（）才能被计算机执行。 22.在计算机内部一般设有（）总线、地址总线和控制总线。 23.计算机内部的信息交换通道称为（）。 24.只读存储器的英文缩写是（）。 25.内存有许多字节组成，每个字节有其唯一的编码，叫做（）。 26.Windows系统在桌面上设置的系统图标，实现“浏览因特网上网站的Web网页”功能的是（）。 27.Windows系统在桌面上设置的系统图标，实现“查看和管理计算机的各种文件夹，文件和设备”功能的是（）。 28.使用热键下拉一个菜单时，应按住（）键再按一个字母键。 29.为获得Word的有关帮助信息，可以从（）菜单中选择命令。 30.在Excel 2003中已输入的数据清单含有字段：编号、姓名和工资，若希望只显示最高工资前5名的职工信息，可以使用（）功能。 31.在Excel 2003中，要求在分类汇总之前，先对关键字段进行（）。 32.在Word 2003中，模板的文件扩展名是（）。 33.在Windows XP中，各个应用程序之间可通过（）交换信息。 34.Windows XP可以按不同的方式排列桌面图标，除了自动排列方式以外，其他四种方式是按名称、按类型、按大小、按（）排列。 35.PowerPoint 2003演示文稿设计模板的默认扩展名是（）。

如何培养学生的计算机操作技能

如何培养学生的计算机操作技能《信息技术》是中小学的一门新课程,也是一门实践性很强的课程。它的目的就是通过培养中小学生掌握、利用计算机的操作技巧、技能,来存储、传输、交流知识,使知识信息化。但近年的教学使我认识到信息技术教学还存在误区和缺陷。一是信息技术教学设备落后。计算机教室共五十台电脑,每班学生都在八十几近九十人,上课学生只能采取轮换操作,学生只能进行简单的操作练习,学生没有足够的操作练习时间,很难掌握计算机操作方法及技能,教学效果差。二是学科整合工作开展不利。因为设备、师资的原因,有些学校把计算机室只作为信息技术的标志或练习室,只在进行信息技术教学时使用,其它学科教学时很少用或根本不用,学生缺少了解、操作计算机的机会。如何发挥计算机的作用,让学生充分掌握计算机的操作技能,已成为信息技术教育的重要目的。首先,学校及上级部门要加大投资增加计算机数量,保障学生上信息课达到每人一台。使每个学生都有充裕的操作练习时间。其次,要让学生端正认识,明确目的。不要让学生误认为信息技术课就是学学简单的操作,应该让他们明白:只有认真学好计算机的操作,才能更好地利用信息技术为自己的学习服务。第三,搞好学科整合。通过多媒体在各学科中的运用,让学生直观接受知识,并从中认识到计算机的巨大作用,产生对计算机的兴趣,增强学习信息技术的信心。第四,认真上好信息技术课。这也是培养学生计算机操作技能的关键。 1、课前认真检查设备。古人云:工欲善其器,必先利其器。课前检查好设备,才能保证计算机正常运转、软件运行顺利。它是上好信息技术课的前提。

2、讲、演、指导相结合。课堂上该讲的一定要讲,该说的一定要说,该演示的必须演示,不能只讲不演示,只演示不讲,成为一边重、一边倒。要通过利用投影机或控制系统,边进行操作演示,边分析、讲解操作方法、技巧,让学生直观、容易地接受、掌握操作技能;练习时分层次、有重点地进行基本技能、扩展技能的训练、指导。 3、安排好课后作业。课后作业要形式多样,要能激发学生兴趣,诸如,让学生搜集其它操作技巧、方法,总结操作经验,在下一堂课上发表、介绍。三、做好课前、课后工作。让学生养成课前预习、课后总结、复习的良好习惯;虽说一般家庭还没有电脑,但信息技术课本图文并茂,操作方法介绍详细,学生可以课前看课本、想操作,课后归纳、总结操作技巧。总之,要掌握计算机的操作,必须认真听、仔细看、多练习、善总结,既课堂上认真听老师讲解操作的技巧、关键,仔细看课文的内容和老师的操作演示;多进行操作练习,善于总结操作经验,及时交流,把老师、同学的操作技巧、经验记录下来,通过实践变成自己的技能。 2008年1月5日

计算机基础练习习题

1.信息安全的四大隐患是：计算机犯罪、______、误操作和计算机设备的物理性破坏。正确答案是：计算机病毒电子计算机之所以能够快速、自动、准确地按照人们意图进行工作，其最主要的原因是【】。 2. a. 存储程序 b. 采用逻辑器件 c. 识别控制代码 d. 总线结构正确答案是：a.存储程序 3.（）信息技术的核心是通信技术和回答正确。正确答案是：计算机技术 4.（）计算思维主要是计算数学、信息科学和计算机学科的任务，与其他学科关系不大。正确的答案是“错”。 5.（）构成计算机的电子和机械的物理实体称为外部设备。正确的答案是“错”。 6.（）计算机系统可靠性指标可用平均无故障运行时间来描述。正确的答案是“对”。 7.冯.诺依曼体系结构的计算机硬件系统的五大部件是【】 a. 输入设备、运算器、控制器、存储器、输出设备 b. 键盘、主机、显示器、硬盘和打印机 c. 输入设备、中央处理器、硬盘、存储器和输出设备 d. 键盘和显示器、运算器、控制器、存储器和电源设备

正确答案是：a.输入设备、运算器、控制器、存储器、输出设备 8.从人类认识和改造世界的思维方式出发，科学思维可以分为理论思维、实验思维和_____三种。正确答案是：计算思维 9.办公自动化是计算机的一项应用，按计算机应用的分类，它属于【】 a. 实时控制 b. 数据处理 c. 辅助设计 d. 科学计算正确答案是：b.数据处理 10.( )图灵机不能计算的问题现代计算机未必不能计算。正确的答案是“错”。 1.反映计算机存储容量的基本单位是【】字长 b. 字 c. 字节 d. 二进制位正确答案是：c.字节计算机的工作过程是【】完全自动控制过程 b. 执行指令的过程 c. 执行程序的过程 d. 执行命令的过程正确答案是：b.执行指令的过程

计算机基础练习题

【基础过关】 1、输入设备是计算机不可缺少的组成部分，人们通过它将指令或信息传送给计算机。下列设备中不属于输入设备的是（）。 A、鼠标 B、键盘 C、优盘 D、摄像头 2、U盘是我们日常生活中经常接触到的存储设备。下图中优盘应与计算机背部面板中哪一个接口相连（） A、① B、② C、③ D、④ 3、下图显示一台计算机的系统信息，说法错误的是（） A、系统版本是windows10专业版 B、CPU是英特尔处理器 C、内存大小为3.89GB D、此系统是64位操作系统 4、夏冰要购买一款家用智能多媒体终端，下图是他搜索到的产品配置参数，下列描述错误的是（）。 A、操作系统是Android 4.1 B、不支持高清的1080P输出 C、支持浏览JPEG、GIF、BMP、PNG格式的图片 D、内置Wi-Fi 802.11b/g/n网络接口

5、一个1TB的移动硬盘，能够存放大约（）首MP3歌曲（一般一首MP3歌曲约5MB）。 A、205 B、500 C、209715 D、419840 6、张强在电脑城装配了一台电脑，在装好所有的硬件后，下图所示的软件中，请问他应该首先安装那款软件？（） A、 B、 C、 D、 7、某同学在操作计算机时出现下图的错误提示，你认为是由哪个操作导致的？（） A、 B、 C、 D、

8、小明在使用Word时，发现菜单选项中有的呈灰色显示、有的后面有“Ctrl+字母”等（如图），就对其进行了研究，得出了如下结论，其中不正确的是（） A、“Ctrl+字母”是相关操作的快捷键 B、点击带“…”的选项将弹出对话框 C、带的菜单项有子菜单 D、灰色的菜单项表示软件没有这个功能 9、小明想卸载装在电脑中的某个软件，下列哪项不能实现？（） A、运用“windows优化大师”“360安全卫士”等软件的卸载程序来卸载 B、直接删除“程序”菜单中的该软件以及桌面的该软件图标 C、使用windows提供的“添加/删除程序”功能卸载 D、使用该软件自带卸载程序卸载 10、小丽不小心按下了Delete键，删除了硬盘中的一张旅游照片，下列说法中错误的是（）。 A、可以从回收站里找回 B、立即通过Ctrl+z能恢复删除的照片 C、重启电脑后照片无法找回 D、立即通过菜单中的撤消删除能进行恢复 11、小明在打开flash软件制作动画时，计算及提示程序没有响应的，我们通常应按下列哪一组键来结束应用程序的执行（） A、Ctrl+Shift+Del B、Ctrl+Del+Esc C、Ctrl+Alt+Del D、Ctrl+Alt+Esc 12、李华电脑里存放着下列四个文件，期中属于可执行文件的是（） A、泰山.txt B、植物大战僵尸.jpg C、旅行记.exe D、班级活动.wps 13、学校要召开纪念建党九十周年大会，宋老师从网上下载了歌曲《红梅赞》，当他把文件名改为歌曲名称后，发现文件的图标也发生了变化（如图），这是因为： A、文件内容与文件名不符 B、文件名中不能含有书名号 C、文件名中的扩展名被删掉了 D、文件被误删除了 14、网络的高速发展，使我们能下载大量的资料，对于文件管理而言，下列做法不恰当的（） A、将下载的文件分类存放。 B、将所有下载的文件原名保存到一个文件夹内。 C、根据见名知意原则对下载文件重命名。 D、及时删除无用的文件。

计算机操作技能测试试题

计算机操作技术技能竞赛模拟测试一、在金山打字选择一篇文章进行打字测试（10分钟）。记录时间和准确率。（10分）二、WORD2010编辑（40分）打开“素材文档.docx”，完成下列操作后以“结果文档.docx”保存到指定的考试文件夹下；完成后的效果如样图所示。（1）将文章标题“美国正要加入反微软阵营”设置成隶书三号，居中对齐，字符间加宽磅，设置段后间距行。（4分）（2）将文章正文中所有中文字符设置成楷体四号，西方字符设置成Tahoma四号。（4分）（3）将正文设置成左对齐，悬挂缩进2字符，行间间距固定为24磅。（4分）（4）将文章的最后三行加上红色的“点-点-短线下划线”。（4分）（5）为文章标题设置“全映像，4pt偏移量”，使用“新闻纸”纹理页面背景。（8分）（6）插入“新闻纸”页脚，作者处输入文字“时事文摘”，设置成隶书四号。（8分）（7）在文章中相应位置处插入剪贴画“计算机”中相应的一张图片，设置成高4厘米宽4厘米，四周型文本环绕。（8分）素材：美国政要加入反微软阵营前美国参议院多数党领袖、1996年共和党提名总统候选人bobdole与美国上诉法院前法官robertbork,4月20日宣布加入一个旨在针对微软公司反托拉斯法行为的组织,使得微软案进一步升温。这个名为"数字化时代推进竞争和创新计划"(procomp)的组织包括了来自netscape、sun、oracle、corel以及软件出版商协会、计算机和通信业联合会等机构的代表,其他非技术行业的代表还包括美国航空公司、美国旅游机构联合会、knightridder新媒体以及航空运输协会等。dole指出,internet在现今的经济生活中已经成为最新的市场,其控制权不能被任何一家公司独占。微软公司目前控制了90%以上的桌面计算机,并正从中得到垄断的好处。bork补充说,微软公司的这种做法违反了传统的竞争规则。procomp发起这项活动是在微软公司和司法部摆好架势对薄公堂的前一天。dole和bork参与此案意义重大,因为这二位都是知名的政要,享有反对政府介入干预市场的很高声誉。三、EXCEL2010电子表格操作（40分）打开“素材文档.xlsx”，完成下列操作后以“结果文档.xlsx”保存到指定的考试文件夹下。（1）计算星期一至星期五每个班的实际出勤人数，实际出勤人数=应出勤人数-本日缺勤人数，计算结果分别填入相应的列中。（6分）（2）将表格A1：B1，C1:D1，E1：F1，G1：H1，I1：J1，K1：L1合并居中。给表格区域添加上边框线，外框线添加粗匣框线。（6分）（3）设置表格自动调整行高列宽，在表格前插入一行，将A1：M1合并居中，输入表格标题“班级出勤情况登记表”，设置成黑体16磅。（6分）（4）计算本周五天的平均缺勤人数。（6分）（5）选用表格中各班级应出勤人数数据列制作折线图，添加线性趋势线；图表放置于表格数据区的下方。（5分）（6）将图表的折线修改为红线，蓝色的数据点，图表区填充黄色，绘图区填充浅绿色。（5分）（7）将本周平均缺勤人数的前三名用红色加粗斜体表示出来，值最小的二名用蓝色

计算机基础练习题大全

江苏省计算机等级考试二级计算机基础练习题第一章信息技术概述练习题 1.微电子技术是以集成电路为核心的电子技术。在下列有关集成电路（IC）的叙述中，错误的是 b 。 A.现代集成电路使用的半导体材料大多数是（Si） B.Pentium4微处理器芯片是一种超大规模集成电路，其集成度在1000万以上 C.目前PC机中所用的的电子元器件均为大规模集成电路？ D.Moore定律指出（预言），集成电路的集成度平均18~24个月翻一番 2.多路复用技术和交换技术的发展极大地提高了通信线路的利用率。在下列的一些叙述中，错误的是 b 。 A.数字传输技术采用的多路复用技术是时分多路复用技术 B.目前有线电视采用频分多路复用技术在同一电缆上传输多套电视节目 C.交换技术主要有两种类型，即电路交换和分组交换 D.采用分组交换技术传递信息的速度比采用电路交换技术快 3.在下列有关计算机中数值信息表示的叙述中，错误的是 b 。 A.正整数无论是采用原码表示还是补码表示，其编码都是相同的 B.相同位数的二进制补码和原码，他们能表示的数的个数也是相同的 C.在实数的浮点表示中，阶码是一个整数 D.从精度上看，Pentium处理器支持多种类型的浮点数 4．十进制数100 对应的二进制数、八进制数和十六进制数分别__________a___ 。Ａ．1100100 、144Q 和64H Ｂ．1100110B 、142Q 和62H Ｃ．1011100B 、144Q 和66H Ｄ．1100100B 、142Q 和60H 5．无线电波按频率（或波长）可分为中波、短波、超短波和微波。在下列关于微波的说法中，错误的是_____a________ 。Ａ．微波沿地球表面传播，易穿过建筑物Ｂ．微波是一种具有极高频率的电磁波，其波长很短Ｃ．微波通信的建设费用低（与电缆通信相比）、抗灾能力强Ｄ．微波传输技术广泛用于移动通信和高清晰度电视的信号传输等 6．几十年来，集成电路技术的发展很快，根据摩尔定律（Moore Law），在过去几十年以及在可预测的未来几年，单块集成电路的集成度平均大约每 c 个月翻一番。A. 1-6 B.6-12 C.12-24 D.24-36 7．计算机中的数值信息分成整数和实数（浮点书）。实数之所以能表示很大或很小的数，是由于使用了 a 。 A．阶码 B.补码 C.反码 D.较长的尾数 8.随着集成电路技术及其制造工艺的发展，CPU芯片的集成度越来越来高，目前Intel公司出品的pentium 4芯片，在其体积仅为几立方厘米的芯片上集成了 d 各种晶体管。Ａ．数万个Ｂ．数百万个Ｃ．近千万个Ｄ．数千万个9.在下列有关数的进制系统的叙述中，不正确的是 b 。 A．所有信息在计算机中的表示均采用二进制编码．对的 B．以任何一种进制表示的数，均可精确地用其它进制来表示． C．二进制数的逻辑运算有三种基本类型，分别为＂与＂、＂或＂和＂非＂．